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I CGCGCCTGCC TCCTCTCCCC AGCCCTGAGC TGCCCCTCCC ACTGCCTTTC 

51 CTTCTTCCCC CCAQTCAGAA GCTTCGCGAC GGCCCACAGA GCCCGTGG6G 

101 TGCCCGACCC TACGCCACCT CCGGCCGGGA GAAAGCCCAC CCTCTCCCGC 

Ibi eCCCCAGOAA ACCGCCGGCG TTCGCCGCTG CGCAGACCCA TCGAATTCTC 

201 CTGGCTGGAG ACGCGCTGGG CGCGGCCCTT TTACCTGGCG TTCGTGT7CT 

251 GCCTGGCCCT GGGGCTGCTG CACGCCATTA AGCTGTACCT CCGGAOGCAG 

301 CGGCTGCTGC GGGACCTGCG CCCCTTCCCA GCGCCCCCCA CCCACTGOTT 

3 SI CCTTGGGCAC CAGAACTTTA TTCAGGATGA TAACATGGAG AAGCTTGAGC 

401 AAATTATTGA AAAATACCCT CCTGCCTTCC CTTTCTCGAT TCGCCCCTTT 

4 51 CAGGCATTTT TCTGTATCTA TGACCCAGAC TATGCAAAGA CACTTCTGAG 

»01 CAGAACAGAT CCCAAGTCCC GGTACCTGCA CAAATTCTCA CCTCCACTTC 

SSI TTGGAAAAGG ACTAGCGGCT CTAGACGGAC CCAAGTGGTT CCACCATCGT 

601 CX;CCTACTAA CTCCTGCATT CCATTTTAAC ATCCTGAAAC CATACATTCA 

6M GGTGATGGCT CATTCTGTGA AAATCATGCT GGATAACTGG GAGAAGATTT 

■»01 GCAGCACTCA GGACACAAGC GTQGAGCTCT ATCACCACAT CAACTCCATG 

151 TCTCTGGATA TAATCATGAA ATGCCCTTTC AGCAAGGAGA CCAACTGCCA 

601 GACAAACAGC ACCCATGATC CTTATGCAAA AGCCATATT? GAAC7CAGCA 

851 AAATCATATT TCACCGCTTC TACAGTTTGT TGTATCACAC TCACATAATT 

901 TTCAAACTCA GCCCTCAGGG CTACCGCTTC CAGAAGTTAA GCCGAGTCTT 

951 GAATCAGTAC ACAGATACAA TAATCCAGGA AAGAAAGAAA TCCCTCCAGC 

1001 CTGGGGTAAA GCACGA7AAC ACTCCCAACA GGAAG7ACCA GGATTTTCTG 

lOSl GATATTGTCC TTTCTGCCAA GCATGAAACT CGTAGCAGCT TCTCACATAT 

HOI TGATCTACAC TCTGAAGTGA GCACATTCCT GTTCGCAGGA CATGACACCT 

1151 TGGCAGCAAG CATCTCCTGG ATCC7TTACT GCCTCGCTCT GAACCCTGAG 

1201 CATCAAGAGA GATGCCGCGA GGAGGTCAGG GGCATCCTGG GGGATGGGTC 

1251 TTCTATCACT TGGGACCAGC TCGCTGAGAT GTCGTACACC ACAATGTGCA 

1301 TCAAGCACAC CTCCCGATTG ATTC C TCCAC TCCCGTCCAT TTCCACAGAT 

1351 CTCACCAAGC CACTTACCTT CCCAGATtKJA TGCACATTGC CTGCAGGGAT 

1401 CACCGTGGTT CTTACTATTT GCGGTCTTCA CCACAACCCT GCTCCTGTCT 

14 51 GGAAAAACCC AAACGTCTTT GACCCCTTGA GGTTCTCTCA GCAGAATTCT 

ISOl GATCACAGAC ACCCCTATCC CTACTTACCA TTCTCAGCTC GATCAAGGAA 

1551 CTGCATTGGG CAGGAGTTTG CCATCATTGA GTTAAAGCTA ACCAirecCT 

1601 TGATTCTGCT CCACTTCAGA GTGACTCCAG AOCCCACCAG GCCTCTTACT 

1651 TTCCCCAACC ATTTTATCCT CAAGCCCAAC AATGGCATGT ATTTGCACCT 

1701 GAAGAAACTC TCTGAATGTT AGATCTCACG GTACAATGAT TAAACGTACT 

17 51 rrCTTTTTCG AAGTTAAATT TACACCTAAT GATCCAAGCA GATACAAAGG 

1801 GATCAATGTA TGGTGGGAGC ATTGCAGCTT GGTGGGATAG GCGTCTCTGT 

1851 GAAGAGATCC AAAATCATTT CTACCTACAC AGTGTGTCAG CTAGATCTCT 

1901 TTCTATATAA CTTTGGGAGA TTTTCAGATC TTTTCTGTTA AACTTTCACT 

1951 ACTATTAATC CTGTATACAC CAATAGACTT TCATATATTT TCTGTTGTTT 

2001 TTAAAATAGT TTTCAGAAn ATGCAAGTAA TAAGTGCATG TATCCTCACT 

2051 GTCAAAAATT CCCAACACTA GAAAATCATG TACAATAAAA ATTTTAAATC 

2101 TCACTTCACT TAGCCGACAT TCCATGCCCT GACCAATCCT ACTCCTTTTC 

21 SI CTAAAAACAC AATAATTTGG TGTGCATTCT TTCAGACTTT TTOCTATACA 

2201 TTTTATATGT ACAAATCTAG CAATCTATTT GTATAGATCT CATCATTCCT 

2251 ATATTCTTAT TGATTTTTrT CACTTAATAA AAATTCAOCT TATTCCTTAA 

2301 AAAAAAAAAA AAAAAAAAAA AAAAAAA 
ISEO ID N0:1) 

b'UTP: 1-189 

Scact Cooon; 190 
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(57) Abstract: The present invention provides amino acid 
sequences of peptides that are encoded by genes within the human 
genome, the drug-metabolizing enzyme peptides of the present 
invention. The present invention specifically provides isolated 
peptide and nucleic acid molecules, methods of identifying 
orthologs and paralogs of the drug-metabolizing enzyme peptides, 
and methods of identifying modulators of the drug-metabolizing 
enzyme peptides. 
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ISOLATED HUMAN DRUG-METABOLIZING PROTEINS, NUGLEIC ACID 
MOLECULES ENCODING HUMAN DRUG-METABOLIZING PROTEINS, 

AND USES THEREOF 

5 RELATED APPLICATIONS 

The present application claims priority to provisional application U.S. Serial No. 
60/241,745, filed October 20, 2000 (Atty. Docket CL000897-PROV), and is a continuation-in- 
part of application U.S. Serial No. 09/739,456, filed December 19, 2000 (Atty. Docket 
CL000897) continuation-in-part of 09/818,647 filed, March 28, 2001 (Atty. Docket CL00897- 
10 CD?) and U.S. Serial No. 09/852,067 filed. May 10, 2001(Atty. Docket CL000897-CIP-B). 

FIELD OF THE INVENTION 

The present invention is in the field of drug-metabolizing proteins that are related to the 
omega-hydroxylase cytochrome P450 drug-metabolizing enzyme subfamily, recombinant DNA 
molecules and protein production. The present invention specifically provides novel drug- 
1 5 metabolizing peptides and proteins and nucleic acid molecules encoding such protein molecules, 
for use in the development of human therapeutics and human therapeutic development. 

BACKGROUND OF THE INVENTION 

Drug-Metabolizing Proteins 

20 Induction of drug-metabolizing enzymes ("DMEs") is a common biological response to 

xenobiotics, the mechanisms and consequences of which are important in academic, industrial, 
and regulatory areas of pharmacology and toxicology. 

For most drugs, drug-metabolizing enzymes determine how long and how much of a drug 
remains in the body. Thus, developers of drugs recognize the importance of characterizing a drug 

25 candidate's interaction with these enzymes. For example, polymorphisms of the drug- 
metabolizing enzyme CYP2D6, a member of the cytochrome p450 ("CYP*') superfamily, yield 

phenotypes of slow or ultra-rapid metabolizers of a wide spectrum of drugs including 

• < 

antidepressants, antipsychotics, beta-blockers, and antiarrhythmics. Such abnormal rates of drug 
metabolism can lead to drug ineffectiveness or to systemic accumulation and toxicity. 



1 
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For pharmaceutical scientists developing a candidate drug, it is important know as early 
as possible in the design phase which enz>'mes metabolize the drug candidate and the speed with 
which they do it. Historically, the enzymes on a dmg*s metabolic pathway were determined 
through metabolism studies in animals, but this approach has now been largely supplanted by the 
5 use of human tissues or cloned drug-metabolizing enzymes to provide insights into the specific 
role of individual forms of these enzymes. Using these tools, the qualitative and quantitative fate 
of a drug candidate can be predicted prior to its first administration to humans. As a 
consequence, the selection and optimization of desirable characteristics of metabolism are 
possible early in the development process, thus avoiding unanticipated toxicity problems and 

1 0 associated costs subsequent to the drug's cUnical investigation. Moreover, the effect of one drug 
on another's disposition can be inferred. 

Known drug-metabolizing enzymes include the cytochrome p450 ("CYP") superfamily, 
N-acetyl transferases C*NAT"), UDP-glucuronosyl transferases ("UGT"), methyl transferases, 
alcohol dehydrogenase ("ADH")> aldehyde dehydrogenase ("ALDH"), dihydropyrimidine 

1 5 dehydrogenase ("DPD"), NADPH:quinone oxidoreductase ("NQO" or "DT diaphorase"), 
catechol O-methyltransferase ("COMP'). glutathione S-transferase ("GST"), histamine 
methyltransferase ("HMT"), sulfotransferases ("ST"), thiopurine methyltransferase ("TPMT"), 
and epoxide hydroxylase. Drug-metabolizing enzymes are generally classified into two phases 
according to their metabolic fimction. Phase I enzymes catalyze modification of functional 

20 groups, and phase II enzymes catalyze conjugation with endogenous substituents. These 

classifications should not be construed as exclusive nor exhaustive, as other mechanisms of drug 
metabolism have been discovered. For example, the use of active transport mechanisms been 
characterized as part of the process of detoxification. 

Phase I reactions include catabolic processes such as deamination of aminases, hydrolysis 

25 of esters and amides, conjugation reactions with, for example, glycine or sulfate, oxidation by 
the cytochrome p450 oxidation/reduction enzyme system and degradation in the fatty acid 
pathway. Hydrolysis reactions occur mainly in the liver and plasma by a variety of non-specific 
hydrolases and esterases. Both deaminases and amidases, also localized in the liver and serum, 
carry out a large part of the catabolic process. Reduction reactions occur mainly intracellularly in 

30 the endoplasmic reticulum. 

Phase II enzymes detoxify toxic substances by catalyzing their conjugation with water- 
soluble substances, thus increasing toxins' solubility in water and increasing their rate of 
excretion. Additionally, conjugation red\ices the toxins' biological reactivity. Examples of 
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phase n enzymes include glutathione S-transferases and UDP-glucuronosyl transferases, which 
catalyze conjugation to glutathione and glucuronic acid, respectively. Transferases perform 
conjugation reactions mainly in the kidneys and liver. 

The liver is the primary site of elimination of most drugs, including psychoactive drugs, 
5 and contains a plurality of both phase I and phase II enzymes that oxidize or conjugate drugs, 
respectively. 

Physicians currently prescribe drugs and their dosages based on a population average and 
fail to take genetic variability into account. The variability between individuals in drug 
metabolism is usually due to both genetic and environmental factors, in particular, how the drug- 

1 0 metabolizing enzymes are controlled. With certain enzymes, the genetic component 

predominates and variability is associated with variants of the normal, wild-type enzjone. 

Most drug-metabolizing enzymes exhibit clinically relevant genetic polymorphisms. 
Essentially all of the major human enzymes responsible for modification, of functional groups or 
conjugation with endogenous subsituents exhibit conunon polymorphisms at the genomic level. 

1 5 For example, polymorphisms expressing a non-functioning variant enzyme results in a sub-group 
of patients in the population who are more prone to the concentration-dependent effects of a 
drug. This sub-group of patients may show toxic side effects to a dose of drug that is otherwise 
without side effects in the general population. Recent development in genotyping allows 
identification of affected individuals. As a result, their atypical metabolism and likely response 

20 to a drug metabolized by the affected enzyme can be imderstood and predicted, thus permitting 
the physician to adjust the dose of drug they receive to achieve improved therapy. 

A sinailar approach is also becoming important in identifying risk factors associated with 
the development of various cancers. This is because the enzymes involved in drug metabolism 
are also responsible for the activation and detoxification of chemical carcinogens. Specifically, 

OC 4-U» ^^^^J^^Z^ ' — i^A^J 1 1- -1 ^- 1 T -- — Ji-;-t,-_^' g.- 
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Abnormal activity of drug-metabolizing enzymes has been implicated in a range of 
human diseases, including cancer, Parkinson's disease, myetonic dystrophy, and developmental 
defects. 



5 Cytochrome p45Q 

An example of a phase I drug-metabolizing enzyme is the cytochrome p450 ("C YP") 
superfamily, the members of which comprise the major drug-metabolizing enzymes expressed in 
the liver. The C YP superfamily comprises heme proteins which catalyze the oxidation and 
dehydrogenation of a number of endogenous and exogenous lipophilic compounds. The CYP 

10 superfamily has iromense diversity in its functions, with hundreds of isoforms in many species 
catalyzing many types of chemical reactions. The CYP superfamily comprises at least 30 related 
enzymes, which are divided into different families according to their amino acid homology. 
Examples of CYP families include CYP families 1,2,3 and 4, which comprise endoplasmic 
reticulum proteins responsible for the metabolism of drugs and other xenobiotics. 

15 Approximately 10-15 individual gene products within these four families metabolize thousands 
of structurally diverse compounds. It is estimated that collectively the enzymes in the CYP 
superfamily participate in the metabolism of greater than 80% of all available dmgs used in 
humans. For example, the CYP 1 A subfamily comprises CYP 1 A2, which metabolizes several 
widely used drugs, including acetaminophen, amitriptyline, caffeine, clozapine, haloperidoi, 

20 imipramine, olanzapine, ondansetron, phenacetin, propafenone, propranolol, tacrine, 

theophylline, verapamil. In addition, CYP enzymes play additional roles in the metabolism of 
some endogenous substrates including prostaglandins and steroids. 

Some CYP enzymes exist in a polymorphic form, meaning that a smaU percentage of the 
population possesses mutant genes that alter the activity of the enzyme, usually by diminishing 

25 or abolishing activity. For example, a genetic polymorphism has been well characterized with the 
CYP 2C19 and CYP 2D6 genes. Substrates of CYP 2C19 uiclude clomipramine, diazepam, 
imipramine, mephenytoin, moclobemide, omeprazole, phenytoin, propranolol, and tolbutamide. 
Substrates of CYP 2D6 include alprenolol, amitriptyline, chlorpheniramine, clomipramine, 
codeine, desipramine, dextromethorphan, encainide, fluoxetine, haloperidoi, imipranune, 

30 indoramin, metoprolol, nortriptyline, ondansetron, oxycodone, paroxetine, propranolol, and 

propafenone. Polymorphic variants of these genes metabolize these substrates at different rates, 
which can efTect a patient's effective therapeutic dosage. 
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While the substrate specificity of CYPs must be very broad to accommodate the 
metabolism of all of these compounds, each individual CYP gene product has a narrower 
substrate specificity defined by its binding and catalytic sites. Drug metabolism can thereby be 
regulated by changes in the amount or activity of specific CYP gene products. Methods of CYP 

5 regulation include genetic differences in the expression of CYP gene products (i.e., genetic 
polymorphisms), inhibition of CYP metabolism by other xenobiotics that also bind to the CYP, 
and induction of certain CYPs by the drug itself or other xenobiotics. Inhibition and induction of 
CYPs is one of the most common mechanisms of adverse drug interactions. For example, the 
CYP3A subfamily is involved in clinically significant drug interactions involving nonsedating 

10 antihistamines and cisapride that may result in cardiac dysrhythmias. In another example, 

CYP3A4 and CYPl A2 enzymes are involved in drug interactions involving theophylline. In yet 
another example, CYP2D6 is responsible for the metabolism of many psychothempeutic agents. 
AdditionalUy, CYP enzymes metabolize the protease inhibitors used to treat patients infected 
with the human immunodeficiency virus. By understanding the unique functions and 

1 5 characteristics of these enzymes, physicians may better anticipate and manage drug interactions 
and may predict or explain an individual's response to a particular therapeutic regimen. 

Examples of reactions catalyzed by the CYP superfamiiy include peroxidative reactions 
utilizing peroxides as oxygen donors in hydroxylation reactions, as substrates for reductive beta- 
scission, and as peroxyhemiacetal intermediates in the cleavage of aldehydes to formate and 

20 alkenes. Lipid hydroperoxides undergo reductive beta-cleavage to give hydrocarbons and 

aldehydic acids. One of these products, trans-4-hydroxynonenal, inactivates CYP, particularly 
alcohol-inducible 2E1, in what may be a negative regulatory process. Although a CYP iron- 
oxene species is believed to be the oxygen donor in most hydroxylation reactions, an iron-peroxy 
species is apparently involved in the deformylation of many aldehydes with desaturation of the 

25 remaining structure, as in aromatization reactions. 

Examples of drugs with oxidative metabolism associated with CYP enzymes include 
acetaminophen, alfentanil, alprazolam, alprenolol, amiodarone, amitriptyline, astemizole, 
buspirone caffeine, carbamazepine, chloipheniramine, cisapride, clomipramine, clomipramine, 
clozapine, codeine, colchicine, Cortisol, cyclophosphamide, cyclosporine, dapsone, desipramine, 

30 dextromethorphan, diazepam, diclofenac, diltiazem, encainide, erythromycin, estradiol, 

felodipine, fluoxetine, fluvastatin, haloperidol, ibuprofen, imipramine, indinavir, indomethacin, 

indoramin, irbesartan, lidocaine, losartan, macrolide antibiotics, mephenytoini, methadone, 

metoprolol, mexilitene, midazolam, moclobemide, naproxen, nefazodone, nicardipine, 

5 
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nifedipine, nitrendipine, nortriptyline, olanzapine, omeprazole, ondansetron, oxycodone, 
paclitaxel, paroxetine, phenacetin, phenytoin, piroxicam, progesterone, propafenone, 
propranolol, quinidine, ritonavir, saquinavir, sertraline, sildenafil, S-warfarin, tacrine, tamoxifen, 
tenoxicam, terfenadine, testosterone, theophylline, timolol, tolbutamide, triazolam, verapamil, 
S and vinblastine. 

Abnormal activity of phase I enzymes has been implicated in a range of human diseases. 
For example, enhanced CYP2D6 activity has been related to malignancies of the bladder, liver, 
pharynx, stomach and lungs, whereas decreased CYP2D activity has been linked to an increased 
risk of Parkinson's disease. Other syndromes and developmental defects associated with 
1 0 deficiencies in the C YP superfamily include cerebrotendinous xanthomatosis, adrenal 
hyperplasia, gynecomastia, and myetonic dystrophy. 

Omega-Hvdroxylase Cytochrome P4S0 

The novel human protein, and encoding gene, provided by the present invention is related to 
1 5 the omega-hydroxylase cytochrome P450 family, which includes, for example, cytochrome P450 
4A4 (C YP4A4), cytochrome P-450p-2, prostaglandin omega-hydroxylase, and laurate omega- 
hydroxylase. Omega-Hydroxylase Cytochrome P450 proteins catalyze omega- (including omega- 1) 
hydroxylation of prostaglandin A and fatty acids such as caprate, laurate, myristate, and palmitate 
(Yoshimura etaL, JBiochem (Tokyo) 1990 Oct;108(4):544-8). CYP4A4 is elevated during 
20 pregnancy (Palmer et al, Arch Biochem Biophys 1 993 Feb 1 ;3 00(2):670-6). 

Matsubara et al, J Biol Chem 1987 Sep 25;262(27):13366-71; Yamamoto et al, (1984) J, 
Biochem. (Tokyo) 96, 593-603; Yokotani etal, Eur JBiochem 1991 Mar28;196(3):531-6; and 
Johnson et al, Biochemistry 1990 Jan 30;29(4):873-9. 



25 Cytochromes, such as the protein provided by the present invention, have many utilities, 

in addition to those described above. Cytochromes not only metabolize normal physiological 
substrates but also neutralize environmental toxins. In addition to oxidizing steroids, l^tty acids, 
and foreign compounds in liver cells, cytochromes can also be induced by toxic chemicals, 
pesticides, and cancerogens. 

30 Inmiunological and PCR-based assays for cytochromes may be used to determine toxicity 

and turnover rate of experimental medicines. Selective cytotoxic drugs can be designed that 
interact with a particxilar cytochrome and trigger cell death, thereby providing potential new 
treatments for cancer. 

6 
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Cytochromes can generate free radicals that cause myocardial cell injury £ind induce 
endothelial cell damage. In experimental models, alpha-tocopherol and other anti-oxidants 
suppress generation of free radicals. Glutathione and glutathione peroxidase contribute to natural 
protection against free radical-induced cell damage. Characterization of all cytochromes will 
5 assist development of more efficient anti-oxidants. The sequence provided by the present 
invention can be used to design specific chemopreventive drugs. 

The cytochrome provided herein, as well as other human cytochromes, can be used in a 
high-throughput drug screen to discover anti-parasitic drugs that inhibit non-human oxigenases 
but exhibit no toxicity for the human enzymes. 

1 0 For a fiirther review of the CYP superfamily , see Igarashi et al.. Arch Biochem Biophys 

1997 Mar \ \339{\y.%5-9\ \ Med Lett Drugs Ther 2000 Apr 17;42(1076):35-6 (no authors listed); 
Fowler et al. Biochemistry 2000 Apr 18;39(15):4406-14; Lamb et aL, Chem Biol Interact 2000 
Mar 15;125(3):165-75; Chiba et al.,Xenobiotica 2000 Feb;30(2):l 17-29; and Meehan et al. Am 
J Hum Genet \9%% Jan;42(l):26-37. 

1 5 The CYP superfamily a major target for drug action and development. Accordingly, it is 

valuable to the field of pharmaceutical development to identify and characterize previously 
unknown members of the CYP superfamily. 



UDP-glucuronosvltransferases 

20 Potential drug interactions involving phase II metabolism are increasingly being 

recognized. An important group of phase II enzymes involved in drug metabolism are the 
glucuronosyltransferases, especially the UDP-glucuronyltransferase ("UGT") superfamily. 
Members of the UGT superfamily catalyze the enzymatic addition of UDP glucuronic acid as a 
sugar donor to fat-soluble chemicals, a process which increases their solubility in water and 

25 increases their rate of excretion. In manunals, glucuronic acid is the main sugar that is used to 
prevent the accumulation of waste products of metabolism and fat-soluble chemicals from the 
environment to toxic levels in the body. Both inducers and inhibitors of 

glucuronosyltransferases are known and have the potential to affect the plasma concentration and 
actions of important drugs, including psychotropic drugs. 
30 The UGT superfamily comprises several families of enzymes in several species defined 

with a nomenclature similar to that used to define members of the CYP superfamily. In animals, 
yeast, plants and bacteria there are at least 1 10 distinct known members of the UGT superfamily. 
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As many as 33 families have been defined, with three families identified in humans. Different 
UGT families are defined as having <45% amino acid sequence homology; within subfamilies 
there is approximately 60% homology. The members of the UGT superfamily are part of a 
further superfamily of UDP glycosyltransferases found in animals, plants and bacteria. 
5 The role of phase II enzymes» and of UGT enzymes in particular, is being increasingly 

recognized as important in psychopharmacology. UGT enzymes conjugate many important 
psychotropic drugs and are an important source of variability in drug response and drug 
interactions. For example, the benzodieizepines lorazepam, oxazepam, and temazepam undergo 
phase II reactions exclusively before being excreted into the urine. 
1 0 Phase II enzymes metabolize and detoxify hazardous substances, such as carcinogens. 

* 

The expression of genes encoding phase II enzymes is known to be up-regulated by hundreds of 
agents. For example, oltipraz is known to up-regulate phase II enzyme expression. Studies have 
demonstrated protection from the cancer-causing effects of carcinogens when selected phase II 
enzyme inducers are administered prior to the carcinogens. The potential use of phase II enzyme 

1 5 inducers in humans for prevention of cancers related to exposure to carcinogens has prompted 
studies aimed at understanding their molecular effects. Current biochemical and molecular 
biological research methodologies can be used to identify and characterize selective phase II 
enzyme inducers and their targets. Identification of genes responding to cancer chemopreventive 
agents will facilitate studies of their basic mechanism and provide insights about the relationship 

20 between gene regulation, enzyme polymorphism, and carcinogen detoxification. 

Examples of drugs with conjugative metabolism associated with UGT enzymes include 
amitriptyline, buprenorphine, chlorpromazine, clozapine, codeine, cyproheptadine, 
dihydrocodeine, doxepin, imipramine, lamotrigine, lorazepara, morphine, nalorphine, naltrexone, 
temazepam, and valproate. 

25 Abnormal activity of phase II enzymes has been implicated in a range of human diseases. 

For example, Gilbert syndrome is an autosomal dominant disorder caused by mutation in the 
UGTl gene, and mutations in the UGTl Al enzyme have been demonstrated to be responsible 
for Crigler-Najjar syndrome. 

The UGT superfamily a major target for drug action and development Accordingly, it is 

30 valuable to the field of pharmaceutical development to identify and characterize previously 
unknown members of the UGT superfemily. 

Drug-metabolizing enzymes, particularly members of the omega-hydroxylase cytochrome 
P450 drug-metaboliang enzyme subfamily, are a major target for drug action and development. 
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Accordingly, it is valuable to the field of phannaceutical development to identify and characterize 
previously unknown members of this subfamily of drug-rnetaboliziTig proteins. The present 
invention advances the state of the art by providing a previously unidentified human drug- 
metabolizing proteins that have homology to members of the omega-hydroxylase cytochrome P450 
5 drug-metabolizing enzyme subfamily. 



SUMMARY OF THE INVENTION 

The present invention is based in part on the identification of amino acid sequences of 
human drug-metabolizing enzyme peptides and proteins that are related to the omega- 

10 hydroxylase cytochrome P450 drug-metabolizing enzyme subfamily, as well as allelic variants 
and other mammalian orthologs thereof. These unique peptide sequences, and nucleic acid 
sequences that encode these peptides, can be used as models for the development of human 
therapeutic targets, aid in the identification of therapeutic proteins, and serve as targets for the 
development of human therapeutic agents that modulate drug-metabolizing enzyme activity in 

IS cells and tissues that express the drug-metabolizing enzyme. Experimental data as provided in 
Figure 1 indicates expression in humans in the stomach, brain (including infant), endometrial 
tumors, prostate, kidney, adrenal gland tumors, head/neck, sympathetic trunk, breast, and 
hepatocellular carcinomas. 



20 DESCRIPTION OF THE FIGURE SHEETS 

FIGURE 1 provides the nucleotide sequence of a cDNA molecule that encodes the drug- 
metabolizing enzyme protein of the present invention. (SEQIDNOrl) In addition, structure 
and fiinctional information is provided, such as ATG start, stop and tissue distribution, where 
available, that allows one to readily determine specific uses of inventions based on this 

25 molecular sequence. Experimenteil data as provided in Figure 1 indicates expression in humans 
in the stomach, brain (including infant), endometrial tumors, prostate, kidney, adrenal gland 
tumors, head/neck, sympathetic trunk, breast, and hepatocellular carcinomas. 

FIGURE 2 provides the predicted amino acid sequence of the drug-metabolizing enzyme 
of the present invention. (SEQIDNO:2) In addition structure and fiuictional information such 

30 as protein family, fimction, and modification sites is provided where available, allowing one to 
readily determine specific uses of inventions based on this molecular sequence. 
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FIGURE 3 provides genomic sequences that span the gene encoding the drug- 
metabolizing enzyme protein of the present invention. (SEQ ID NO:3) In addition structure and 
functional information, such as intron/exon structure, promoter location, etc., is provided where 
available, allowing one to readily determine specific uses of inventions based on this molecular 
5 sequence. As illustrated in Figure 3, SNFs were identified at 45 different nucleotide positions. 

DETAILED DESCRIPTION OF THE INVENTION 

General Description 

The present invention is based on the sequencing of the human genome. During the 

1 0 sequencing and assembly of the human genome, analysis of the sequence information revealed 
previoiisly unidentified fragments of the human genome that encode peptides that share 
structural and/or sequence homology to protein/peptide/domains identified and characterized 
within the art as being a drug-metabolizing enzyme protein or part of a drug-metabolizing 
enzyme protein and are related to the omega-hydroxylase cytochrome P450 drug-metabolizing 

IS enzyme subfamily. Utilizing these sequences, additional genomic sequences were assembled 
and transcript and/or cDNA sequences were isolated and characterized. Based on this analysis, 
the present invention provides amino acid sequences of human drug-metabolizing enzyme 
peptides and proteins that are related to the omega-hydroxylase cytochrome P450 drug- 
metabolizing enzyme subfamily, nucleic acid sequences in the form of transcript sequences, 

20 cDN A sequences and/or genomic sequences that encode these drug-metabolizing enzyme 
peptides and proteins, nucleic acid variation (allelic information), tissue distribution of 
expression, and information about the closest art known protein/peptide/domain that has 
structural or sequence homology to the drug-metabolizing enzyme of the present invention. 

In addition to being previously unknown, the peptides that are provided in the present 

25 invention are selected based on their ability to be used for the development of commercially 
important products and services. Specifically, the present peptides are selected based on 
homology and/or structural relatedness to known drug-metabolizing enzyme proteins of the 
omega-hydroxylase cytochrome P450 drug-metabolizing enzyme subfamily and the expression 
pattem observed. Experimental data as provided in Figure 1 indicates expression in humans in 

30 the stomach, brain (including infant), endometrial tumors, prostate, kidney, adrenal gland 

tumors, head/neck, sympathetic trunk, breast, and hepatocellular carcinomas. The art has clearly 

established the commercial importance of members of this family of proteins and proteins that 

10 
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have expression patterns similar to that of the present gene. Some of the more specific features 
of the peptides of the present invention, and the vises thereof, are described herein, particularly in 
the Background of the Invention and in the annotation provided in the Figures, and/or are known 
within the art for each of the known omega-hydroxylase cytochrome P450 family or subfamily 
5 of drug-metabolizing enzyme proteins. 

Specific Embodiments 
Peptide Molecules 

The present invention provides nucleic acid sequences that encode protein molecules that 
10 have been identified as being members of the drug-metabolizing enzyme family of proteins and 
are related to the omega-hydroxylase cytochrome P450 drug-metabolizing enzyme subfamily 
(protein sequences are provided in Figure 2, transcript/cDNA sequences are provided in Figure 1 
and genomic sequences are provided in Figure 3). The peptide sequences provided in Figure 2, 
as well as the obvious variants described herein, particularly allelic variants as identified herein 
15 and using the information in Figure 3, will be referred herein as the drug-metabolizing enzyme 
peptides of the present invention, drug-metabolizing enzyme peptides, or peptides/proteins of the 
present invention. 

The present invention provides isolated peptide and protein molecules that consist of, 
consist essentially of, or comprise the amino acid sequences of the drug-metabolizing enzyme 
20 peptides disclosed in the Figure 2, (encoded by the nucleic acid molecule shown in Figure 1 , 
transcript/cDNA or Figure 3, genomic sequence), as well as all obvious variants of these 
peptides that are within the art to make and use. Some of these variants are described in detail 
below. 

As used herein, a peptide is said to be "isolated" or "purified" when it is substantially free 

25 of cellular material or firee of chemical precursors or other chemicals. The peptides of the present 

invention can be purified to homogeneity or other degrees of purity. The level of purification will 

be based on the intended use. The critical feature is that the preparation allows for the desired 

fiinction of the peptide, even if in the presence of considerable amoxmts of other components (the 

features of an isolated nucleic acid molecule is discussed below). 

30 In some uses, "substantially fi-ee of cellular material" includes preparations of the peptide 

having less than about 30% (by dry weight) other proteins (i.e., contanainating protein), less than 

about 20% other proteins, less than about 10% other proteins, or less than about 5% other proteins. 

11 
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When the peptide is recombinantly produced, it can also be substantially free of culture medium, 
i.e., culture medium represents less than about 20% of the volume of the protein preparation. 

The language "substantially free of chemical precursors or other chemicals'* includes 
preparations of the peptide in which it is separated from chemical precursors or other chemicals that 

5 are involved in its synthesis. Li one embodiment, the language "substantially fi«e of chemical 
precursors or other chemicals" includes preparations of the drug-metabolizing enzyme peptide 
having less than about 30% (by dry weight) chemical precursors or other chemicals, less than about 
20% chemical precursors or other chemicals, less than about 10% chemical precursors or other 
chemicals, or less than about S% chemical precursors or other chemicals. 

1 0 The isolated drug-metabolizing enzyme peptide can be purified from cells that naturally 

express it, purified from cells that have been altered to express it (recombinant), or synthesized 
using kiK)wn protein synthesis methods. E^qDerimental data as provided in Figure 1 indicates 
expression in humans in the stomach, brain (including infant), endometrial tumors, prostate, kidney, 
adrenal gland tumors, head/neck, sympathetic trunk, breast, and hepatocellular carcinomas. For 

1 5 example, a nucleic acid molecule encoding the drug-metabolizing enzyme peptide is cloned into an 
expression vector, the expression vector introduced into a host cell and the protein expressed in the 
host cell. The protein can then be isolated from the cells by an appropriate purification scheme 
using standard protein purification techniques. Many of these techniques are described in detail 
below. 

20 Accordingly, the present invention provides proteins that consist of the amino acid 

sequences provided in Figure 2 (SEQ ID NO:2), for example, proteins encoded by the 
transcript/cDNA nucleic acid sequences shown in Figure 1 (SEQ ID NO:l) and the genomic 
sequences provided in Figure 3 (SEQ ID N0:3). The amino acid sequence of such a protein is 
provided in Figure 2. A protein consists of an amino acid sequence when the amino acid sequence 

25 is the final amino acid sequence of the protein. 

The present invention fiorther provides proteins that consist essentially of the amino acid 
sequences provided in Figure 2 (SEQ ID N0:2), for example, proteins encoded by the 
transcript/cDNA nucleic acid sequences shown in Figure 1 (SEQ ID NO:l) and the genomic 
sequences provided in Figure 3 (SEQ ID NO:3). A protein consists essentially of an amino acid 

30 sequence when such an amino acid sequence is present with only a few additional amino acid 

residues, for example fix>m about 1 to about 100 or so additional residues, typically from 1 to about 
20 additional residues in the final protein. 
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The present invention further provides proteins that comprise the amino acid sequences 
provided in Figure 2 (SEQ ID NO:2), for example, proteins encoded by the transciipt/cDNA nucleic 
acid sequences shown in Figure 1 (SEQ ID NO: 1) and the genomic sequences provided in Figure 3 
(SEQ ID NO:3). A protein comprises an amino acid sequence vAicn the amino acid sequence is at 
5 least part of the final amino acid sequence of the protein. In such a fashion, the protein can be; only 
the peptide or have additional amino acid molecules, such as amino acid residues (contiguous 
encoded sequence) that are naturally associated with it or heterologous amino acid residues/peptide 
sequences. Such a protein can have a few additional amino acid residues or can comprise several 
hundred or more additional amino acids. The preferred classes of proteins that are comprised of the 
1 0 drug-metabolizing enzyme peptides of the present invention are the naturally occurring mature 
proteins, A brief description of how various types of these proteins can be made/isolated is 
provided below. 

The drug-metabolizing enzyme peptides of the present invention can be attached to 
heterologous sequences to fonn chimeric or fusion proteins. Such chimeric and fusion proteins 

1 5 comprise a drug-metabolizing enzyme peptide operatively linked to a heterologous protein having 
an amino acid sequence not substantially homologous to the drug-metabolizing enzyme peptide. 
"Operatively linked" indicates that the drug-metabolizing enzyme peptide and the heterologous 
protein are fused in-firame. The heterologous protein can be fused to the N-terminus or C-terminus 
of the drug-metaboUzing enzyme peptide. 

20 In some uses, the fusion protein does not affect the activity of the drug-metaboliztng enzyme 

peptide per se. For example, the fiision protein can include, but is not limited to, enzymatic fusion 
proteins, for example beta-galactosidase fusions, yeast two-hybrid GAL fusions, poly-His fusions, 
MYC-tagged, Hl-tagged and Ig fiisions. Such fusion proteins, particularly poly-His fusions, can 
facilitate the purification of recombinant drug-metabolizing enzyme peptide. In certain host cells 

25 (e.g., mammalian host cells), expression and/or secretion of a protein can be increased by using a 
heterologous signal sequence. 

A chimeric or fusion protein can be produced by standard recombinant DNA techniques. 
For example, DNA fragments coding for the different protein sequences are ligated together in- 
fi*ame in accordance with conventional techniques. In another embodiment, the fusion gene can be 

30 synthesized by conventional techniques including automated DNA synthesizers. Alternatively, PGR 
amplification of gene fragments can be carried out using anchor primers which give rise to 
complementary overhangs between two consecutive gene fragments which can subsequently be 
annealed and re-amplified to generate a chimeric gene sequence (see Ausubel et al. Current 
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Protocols in Molecular Biology ^ 1992). Moreover, many expression vectors are conunercially 
available that already encode a fusion moiety (e.g., a GST protein). A drug-metabolizing enzyme 
peptide-encoding nucleic acid can be cloned into such an expression vector such that the fusion 
moiety is linked in-frame to the drug-metabolizing enzyme peptide. 
S As mentioned above, the present invention also provides and enables obvious variants of the 

amino acid sequence of the proteins of the present invention, such as naturally occurring mature 
forms of the peptide, allelic/sequence variants of the peptides, non-naturally occurring 
recombinantly derived variants of the peptides, and orthologs and paralogs of the peptides. Such 
variants can readily be generated using art-known techniques in the fields of recombinant nucleic 

1 0 add technology and protein biochemistry. It is understood, however, that variants exclude any 
amino acid sequences disclosed prior to the invention. 

Such variants can readily be identified/made using molecular techniques and the sequence 
information disclosed herein. Further, such variants can readily be distinguished from other 
peptides based on sequence and/or structural homology to the drug-metabolizing enzyme peptides 

1 5 of the present invention. The degree of homology/identity present will be based primarily on 
whether the peptide is a functional variant or non-functional variant, the amount of divergence 
present in the paralog family and the evolutionary distance between the orthologs. 

To determine the percent identity of two amino acid sequences or two nucleic acid 
sequences, the sequences are aligned for optimal comparison purposes (e.g., gaps can be 

20 introduced in one or both of a first and a second amino acid or nucleic acid sequence for optimal 
alignment and non-homologous sequences can be disregarded for comparison purposes). In a 
preferred embodiment, at least 30%, 40%, 50%, 60%, 70%, 80%, or 90% or more of the length 
of a reference sequence is aligned for comparison purposes. The amino acid residues or 
nucleotides at corresponding amino acid positions or nucleotide positions are then compared. 

25 When a position in the first sequence is occupied by the same amino acid residue or nucleotide 
as the corresponding position in the second sequence, then the molecules are identical at that 
position (as used herein amino acid or nucleic acid "identity" is equivalent to amino acid dr 
nucleic acid "homology"). The percent identity between the two sequences is a function of the 
niunber of identical positions shared by the sequences, taking into account the number of gaps, 

30 and the length of each gap, which need to be introduced for optimal alignment of the two 
sequences. 

The comparison of sequences and determination of percent identity and similarity 

between two sequences can be accomplished using a mathematical algorithm. {Computational 
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Molecular Biology, Lesk, A.M., ed, Oxford University Press, New York, 1988; Biocomputing: 
Informatics and Genome Projects, Smith, D.W., ed.. Academic Press, New York, 1993; Computer 
Analysis of Sequence Data, Part 7, Griffin, A.M., and Griifin, H.G., eds., Humana Press, New 
Jersey, 1994; Sequence Analysis in Molecular Biology, von Heinje, G., Academic Press, 1987; and 
5 Sequence Analysis Primer^ Gribskov, M. and Devereux, J., eds., M Stockton Press, New York, 
1991), In a preferred embodiment, the percent identity between two amino acid sequences is 
determined using the Needleman and Wunsch {J, Mol Biol (48):444-453 (1970)) algorithm 
which has been incorporated into the GAP program in the GCG software package (available at 
http://www.gcg.com), using either a Blossom 62 matrix or a PAM250 matrix, and a gap weight 

10 of 16, 14, 12, 10, 8, 6, or 4 and a length weight of 1, 2, 3, 4, 5, or 6. In yet another preferred 
embodiment, the percent identity between two nucleotide sequences is determined using the 
GAP program in the GCG software package (Devereux, J., et al. Nucleic Acids Res. 12(]):3S7 
(1984)) (available at http://www.gcg.com), using a NWSgapdna.CMP matrix and a ge^ weight of 
40, 50, 60, 70, or 80 and a length weight of 1, 2, 3, 4, 5, or 6. In another embodiment, the 

1 5 percent identity between two amino acid or nucleotide sequences is determined using the 

algorithm of E. Myers and W. Miller (CABIOS, 4:1 1-17 (1989)) which has been incorporated 
into the ALIGN program (version 2.0), using a PAM120 weight residue table, a gap length 
penalty of 12 and a gap penalty of 4. 

The nucleic acid and protein sequences of the present invention can further be used as a 

20 "query sequence" to perform a search against sequence databases to, for example, identify other 
family members or related sequences. ' Such searches can be performed using the NBLAST and 
XBLAST programs (version 2.0) of Altschul. et al (J. Mol Biol 215:403-10 (1990)). BLAST 
nucleotide searches can be performed with the NBLAST program, score = 100, wordlength =12 
to obtain nucleotide sequences homologous to the nucleic acid molecules of the invention. 

25 BLAST protein searches can be performed with the XBLAST program, score = 50, wordlength = 
3 to obt£iin amino acid sequences homologous to the proteins of the invention. To obtain gapped 
alignments for comparison purposes. Gapped BLAST can be utilized as described in Altschul et 
al {Nucleic Acids Res. 25(17):3389-3402 (1997)). When utilizing BLAST and gapped BLAST 
programs, the default parameters of the respective programs (e.g., XBLAST and NBLAST) can 

30 be used. 

Full-length pre-processed forms, as well as mature processed forms, of proteins that 

comprise one of the peptides of the present invention can readily be identified as having complete 

sequence identity to one of the drug-metabolizing enzyme peptides of the present invention as well 
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as being encoded by the same genetic lociis as the drug-metabolizing enzyme peptide provided 
herein. The gene encoding the novel drug-metabolizing protein of the present invention is located 
on a genome component that has been mapped to h\iman chromosome 1 (as indicated in Figure 3), 
vs^hich is supported by multiple lines of evidence, such as STS and BAG map data, 
S Allelic variants of a drug-metabolizing enzjmie peptide can readily be identified as being a 

hiiman protein having a high degree (significant) of sequence homology/identity to at least a portion 
of the drug-metabolizing enzyme peptide as well as being encoded by the same genetic locus as the 
drug-metabolizing enzyme peptide provided herein. Genetic locus can readily be determined based 
on the genomic information provided in Figure 3, such as the genomic sequence mapped to the 

4 

1 0 reference human. The gene encoding the novel drug-metabolizing protein of the present invention 
is located on a genome component that has been mapped to human chromosome 1 (as indicated in 
Figure 3)» v/hich is supported by multiple lines of evidence, such as STS and BAG map data. As 
used herein, two proteins (or a region of the proteins) have significant homology when the amino 
acid sequences are typically at least about 70-80%, 80-90%, and more typically at least about 90- 

15 95% or more homologous. A significantly homologous amino acid sequence, according to the 
present invention, will be encoded by a nucleic acid sequence that will hybridize to a drug- 
metabolizing enzyme peptide encoding nucleic acid molecule under stringent conditions as more 
fully described below. 

Figure 3 provides SNP information that has been found in the gene encoding the drug- 

20 metabolizing proteins of the present invention. SNPs, including insertion/deletion variants 
("indels"), were identified at 45 different nucleotide positions. Ghanges in the amino acid 
sequence caused by these SNPs can readily be determined using the universal genetic code and 
the protein sequence provided in Figure 2 as a reference. Positioning of each SNP in exons, 
introns, or outside the ORF can readily be determined using the DNA positions given for each 

25 SNP and the start/stop, exon, and intron coordinates given in the features. 

Paralogs of a drug-metabolizing enzyme peptide can readily be identified as having some 
degree of significant sequence homology/identity to at least a portion of the drug-metaboUzing 
enzyme peptide, as being encoded by a gene fi-om humans, and as having similar activity or 
function. Two proteins will typically be considered paralogs when the amino acid sequences are 

30 typically at least about 60% or greater, and more typically at least about 70% or greater 

homology through a given region or domain. Such paralogs will be encoded by a nucleic acid 
sequence that will hybridize to a drug-metabolizing enzyme peptide encoding nucleic acid 
molecule under moderate to stringent conditions as more fully described below. 

16 
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Orthologs of a drug-metabolizing enzyrae peptide can readily be identified as having some 
degree of significant sequence homology/identity to at least a portion of the drug-metabolizing 
enzyme peptide as weU as being encoded by a gene from another organism. Preferred orthologs 
will be isolated from mammals, preferably primates, for the development of human therapeutic 

5 targets and agents. Such orthologs will be encoded by a nucleic acid sequence that will hybridize 
to a drug-metabolizing enzyme peptide encoding nucleic acid molecule under moderate to 
stringent conditions, as more fully described below, depending on the degree of relatedness of 
the two organisms yielding the proteins. 

Non-naturally occutring variants of the drug-metabolizing enzyme peptides of the present 

1 0 invention can readily be generated using recombinant techniques. Such variants include, but are not 
limited to deletions, additions and substitutions in the amino acid sequence of the dmg-metabolizing 
enzyme peptide. For example, one class of substitutions are conserved amino acid substitution. 
Such substitutions are those that substitute a given amino acid in a drug-metabolizing enzyme 
peptide by another amino acid of like characteristics. Typically seen as conservative substitutions 

IS are the replacements, one for another, among the aliphatic amino acids Ala, Val, Leu, and He; 
interchange of the hydroxyl residues Ser and Thr; exchange of the acidic residues Asp and Glu; 
substitution between the amide residues Asn and Gin; exchange of the basic residues Lys and Arg; 
and replacements among the aromatic residues Phe and Tyr. Guidance concerning which amino 
acid changes are likely to be phenotypically silent are found in Bowie et al,^ Science 2-^7:1306-1310 

20 (1990). 

Variant drug-metabolizing enzyme peptides can be fully functional or can lack function in 
one or more activities, e.g. ability to bind substrate, ability to phosphorylate substrate, ability to 
mediate signaling, etc. Fully functional variants typically contain only conservative variation or 
variation in non-critical residues or in non-critical regions. Figure 2 provides the result of protein 
25 analysis and can be used to identify critical domains/regions. Functional variants can also contain 
substitution of similar amino acids that resvilt in no change or an insignificant change in function. 
Alternatively, such substitutions may positively or negatively affect function to some degree. 

Non-functional variants typically contain one or more non-conservative amino acid 
substitutions, deletions, insertions, inversions, or truncation or a substitution, insertion, inversion, or 
30 deletion in a critical residue or critical region. 

Amino acids that are essential for function can be identified by methods known in the art, 
such as site-directed mutagenesis or alanine-scanning mutagenesis (Cunningham et al, , Science 
2^4:1081-1085 (1989)), particularly using the results provided in Figure 2. The latter procedure 
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introduces single alanine mutations at every residue in the molecule. The resulting mutant 
molecules are then tested for biological activity such as dmg-metabolizing enzyme activity or in 
assays such as an in vitro proliferative activity. Sites that are critical for binding partner/substrate 
binding can also be determined by structural analysis such as crystallization, nuclear magnetic 
5 resonance or photoafFinity labeling (Smith et al, J. Mol Biol 224: 899-904 ( 1 992); de Vos et al. 
Science 255:306-312 (1992)). 

The present invention fiirther provides fragments of the drag-metaboUzing enzyme peptides, 
in addition to proteins and peptides that comprise and consist of such fragments, particularly those 
comprising the residues identified in Figure 2. The fragments to which the invention pertains, 
1 0 however, are not to be construed as encompassing fragments that may be disclosed publicly prior to 
the present invention. 

As used herein, a fragment comprises at least 8, 10, 12, 14, 16, or more contiguous amino 
acid residues from a drug-metabolizing enzyme peptide. Such fragments can be chosen based on 
the ability to retain one or more of the biological activities of the drug-metabolizing enzyme peptide 

IS or could be chosen for the ability to perform a frmction, e.g. bind a substrate or act as an 

immunogen. Particularly important fragments are biologically active fragments, peptides that are, 
for example, about 8 or more amino acids in length. Such fragments will typically comprise a 
domain or motif of the drug-»metabolizing enzyme peptide, e.g., active site, a transmembrane 
domain or a substrate-binding domain. Further, possible fragments include, but are not limited to, 

20 domain or motif containing fragments, soluble peptide fragments, and fragments containing 

immunogenic structures. Predicted domains and functional sites are readily identifiable by computer 
programs well known and readily available to those of skill in the art (e.g., PROSITE analysis). The 
results of one such analysis are provided in Figure 2. 

Polypeptides often contain amino acids other than the 20 amino acids commonly referred to 

25 as the 20 naturally occurring amino acids. Further, many amino adds, including the terminal amino 
acids, may be modified by natural processes, such as processing and other post-translational 
modifications, or by chemical modification techniques well known in the art. Common 
modifications that occur naturally in drug-metabolizing enzyme peptides are described in basic 
texts, detailed monographs, and the research literature, and they are well known to those of skill in 

30 the art (some of these features are identified in Figure 2). 

Known modifications include, but are not limited to, acetylation, acylation, ADP- 

ribosylation, amidation, covalent attachment of flavin, covalent attachment of a heme moiety, 

covalent attachment of a nucleotide or nucleotide derivative, covalent attachment of a lipid or lipid 
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derivative, covalent attachment of phosphotidylinositol, cross-liiiking, cyclization, disulfide bond 
formation, demethylation, formation of covalent crosslinks, formation of cystine, formation of 
pyroglutamate, formylation, gamma carboxylation, glycosylation, GPI anchor formation, 
hydroxylation, iodination, methylation, myristoylation, oxidation, proteolytic processing, 
S phosphorylation, prenylation, racemization, selenoylation, sulfation, transfer-RNA mediated 
addition of amino acids to proteins such as arginylation, and ubiquitination. 

Such modifications are well known to those of skUl in the art and have been described in 
great detail in the scientific literature. Several particularly common modifications, glycosylation, 
lipid attachment, sulfation, ganmia-carboxylation of glutamic acid residues, hydroxylation and 

1 0 ADP-ribosylation, for instance, are described in most basic texts, such as Proteins - Structure and 
Molecular Properties, 2nd Ed., T.E. Creighton, W. H. Freeman and Company, New York (1993). 
Many detailed reviews are available on this subject, such as by Wold, F., Posttranslational Covalent 
Modification of Proteins, B.C. Johnson, Ed., Academic Press, New York 1-12 (1983); Seifler et al, 
(Metk EnzymoL 182: 626-646 (1990)) and Rattan etaL (Ann. N,Y. Acad Sci, 663:46-62 (1992)). 

1 5 Accordingly, the drug-metabolizing enzyme peptides of the present invention also 

encompass derivatives or analogs in which a substituted amino acid residue is not one encoded by 
the genetic code, in which a substituent group is included, in which the mature drug-metabolizing 
enzyme peptide is fused with another compound, such as a compound to increase the half-life of the 
drug-metabolizing enzyme peptide (for example, polyethylene glycol), or in which the additional 

20 amino acids are fiised to the mature drug-metabolizing enzyme peptide, such as a leader or secretory 
sequence or a sequence for purification of the mature drug-metabolizdng enzyme peptide or a pro- 
protein sequence. 



Protein/Peptide Uses 

25 The proteins of the present invention can be used in substantial and specific assays 

related to the functional information provided in the Figures; to raise antibodies or to elicit 
another immune response; as a reagent (including the labeled reagent) in assays designed to 
quantitatively determine levels of the protein (or its binding partner or ligand) in biological 
fluids; and as markers for tissues in ^ich the corresponding protein is preferentially expressed 

30 (either constitutively or at a particular stage of tissue differentiation or development or in a 

disease state). Where the protein binds or potentially binds to another protein or ligand (such as, 
for example, in a drug-metabolizing enzyme-effector protein interaction or drug-metabolizing 
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enzyme-ligand interaction), the protein can be used to identify the binding partner/ligand so as to 
develop a system to identify inhibitors of the binding interaction. Any or all of these uses are 
capable of being developed into reagent grade or kit format for commercialization as commercial 
products. 

5 Methods for performing the uses listed above are well kno>Mi to those skilled in the art. 

References disclosing such methods include "Molecular Cloning: A Laboratory Manual", 2d ed.. 
Cold Spring Harbor Laboratory Press, Sambrook, J., E. F. Fritsch and T. Maniatis eds., 1989, 
and "Methods in Enzymology: Guide to Molecular Cloning Techniques", Acadmiic Press, 
Berger, S. L. and A. R, Kimmel eds., 1987. 

1 0 The potential uses of the peptides of the present invention are based primarily on the 

source of the protein as well as the class/action of the protein. For example, drug-metabolizing 
enzymes isolated from humans and their human/mammalian orthologs serve as targets for 
identifying agents for use in mammalian therapeutic applications, e.g. a himaan drug, particularly 
in modulating a biological or pathological response in a cell or tissue that expresses the drug- 

15 metabolizing enzyme. Experimental data as provided in Figure 1 indicates that drug- 
metabolizing enzyme proteins of the present invention are expressed in humans in the stomach, 
brain (including infant), endometrial tumors, prostate, kidney, adrenal gland timiors, head/neck, 
• sympathetic trunk, breast, and hepatocellular carcinomas, as indicated by virtual northern blot 

4 

analysis. PCR-based tissue screening panels also indicate expression in the brain. A large 
20 percentage of pharmaceutical agents are being developed that modulate the activity of drug- 
metabolizing en2yme proteins, particularly members of the omega-hydroxylase C3^ochrome 
P450 subfamily (see Background of the Invention), The structiural and functional information 
provided in the Background and Figures provide specilSc and substantial uses for the molecules 
of the present invention, particularly in combination with the expression information provided in 
25 Figure 1 . Experimental data as provided in Figure 1 indicates expression in humans in the 

stomach, brain (including in&nt), endometrial tumors, prostate, kidney, adrenal gland tumors, 
head/neck, sympathetic trunk, breast, and hepatocellular carcinomas. Such uses can readily be 
determined using the information provided herein, that which is known in the art, and routine 
experimentation. 

30 The drug-metabolizing enzyme polypeptides (including variants and fragments that may 

have been disclosed prior to the present invention) are useful for biological assays related to drug- 
metabolizing enzymes that are related to members of the omega-hydroxylase cytochrome P450 

subfannily. Such assays involve any of the known drug-metaboUzing enzyme functions or activities 

20 
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or properties useful for diagnosis and treatment of drug-metabolizing enzyme-related conditions 
that are specific for the subfamily of drug-metabolizing enzymes that the one of the present 
invention belongs to, particularly in cells and tissues that express the drug-metabolizing enzyme. 
Experimental data as provided in Figure 1 indicates that drug-metabolizing enzyme proteins of the 
5 present invention are expressed in humans in the stomach, brain (including infant), endometrial 
tumors, prostate, kidney, adrenal gland tumors, head/neck, sympathetic trunk, breast, and 
hepatocellular carcinomas, as indicated by virtual northern blot analysis. PCR-based tissue 
screening panels also indicate expression in the brain. 

The drug-metabolizing enzyme polypeptides are also useful in drug screening assays, in 

1 0 cell-based or cell-free systems. Cell-based systems can be native, i.e., cells that normally express 
the drug-metabolizing enzyme, as a biopsy or expanded in cell culture. Experimental data as 
provided in Figure 1 indicates expression in humans in the stomach, brain (including infant), 
endometrial tumors, prostate, kidney, adrenal gland tumors, head/neck, sympathetic trunk, breast, 
and hepatocellular carcinomas. In an alternate embodiment, cell-based assays involve recombinant 

1 5 host cells expressing the drug-metabolizing enzyme protein. 

The polypeptides can be used to identify compounds that modulate drug-metabolizing 
enzyme activity of the protein in its natural state or an altered form that causes a specific disease or 
pathology associated with the drug-metabolizing enzyme. Both the drug-metabolizing enzymes of 
the present invention and appropriate variants and fragments can be used iii high-throughput screens 

20 to assay candidate compounds for the ability to bind to the drug-metabolizing enzyme. These 

compounds can be further screened against a functional drug-metabolizing enzyme to determine the 
effect of the compound on the drug-metabolizing enzyme activity. Further, these compounds can 
be tested in animal or invertebrate systems to determine activity/effectiveness. Compounds can be 
identified that activate (agonist) or inactivate (antagonist) the drug-metabolizing enzyme to a 

25 desired degree. 

Further, the drug-metabolizing enzyme polypeptides can be used to screen a compoxmd for 
the ability to stimulate or inhibit interaction between the drug-metabolizing enzyme protein and a 
molecule that normally interacts with the drug-metabolizing enzyme protein. Such assays typically 
include the steps of combining the drug-metabolizing enzyme protein with a candidate compound 
30 under conditions that allow the drug-metabolizing enzyme protein, or fragment, to interact with the 
target molecule, and to detect the fonnation of a complex between the protein and the target or to 
detect the biochemical consequence of the interaction with the drug-metabolizing en2:yme protein 
and the target. 
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Candidate compounds include, for example, 1) peptides such as soluble peptides, including 
Ig-tailed fusion peptides and members of random peptide libraries (see, e.g.. Lam et al^ Nature 
554:82-84 (1991); Houghten et al. Nature 3J4:84-86 (1991)) and combinatorial chemistrynierived 
molecular libraries made of D- and/or L- configuration amino acids; 2) phosphopeptides (e.g., 
5 members of random and partially degenerate, directed phosphopeptide libraries, see, e.g., Songyang 
et ai. Cell 72:767-778 (1993)); 3) antibodies (e.g., polyclonal, monoclonal, himianized, anti- 
idiotypic, chimeric, and single chain antibodies as well as Fab, F(ab')2, Fab expression library 
fragments, and epitope-binding fragments of antibodies); and 4) small organic and inorganic 
molecules (e.g., molecules obtained fiom combinatorial and natural product libraries). 

1 0 One candidate compound is a soluble fragment of the receptor that competes for substrate 

binding. Other candidate compounds include mutant drug-metabolizing enzymes or appropriate 
fragments containing mixtations that affect drug-metabolizing enzyme function and thus compete for 
substrate. Accordingly, a fragment that competes for substrate, for example with a higher affinity, 
or a fragment that binds substrate but does not allow release, is encompassed by the invention. 

IS Any of the biological or biochemical functions mediated by the drug-metaboUzing enzyme 

can be used as an endpoint assay. These include all of the biochemical or biochemical/biological 
events described herein, in the references cited herein, incoiporated by reference for these endpoint 
assay targets, and other functions known to those of ordinary skill in the art or that can be readily 
identified using the information provided in the Figures, particularly Figure 2. Specifically, a 

20 biological function of a cell or tissues that expresses the drug-metabolizing enzyme can be assayed. 
Experimental data as provided in Figure 1 indicates that drug-metabolizing enzyme proteins of the 
present invention are expressed in hiimans in the stomach, brain (including infant), endometrial 
tumors, prostate, kidney, adrenal gland tumors, head/neck, sympathetic trunk, breast, and 
hepatocellular carcinomas, as indicated by virtual northern blot analysis. PCR-based tissue 

25 screening panels also indicate expression in the brain. 

Binding and/or activating compounds can also be screened by using chimeric drug- 
metabolizing enzyme proteins in which the amino terminal extracellular domain, or parts thereof, 
the entire transmembrane domain or subregions, such as any of the seven transmembrane segments 
or any of the intracellular or extracellular loops and the carboxy terminal intracellular domain, or 

30 parts thereof, can be replaced by heterologous domains or subregions. For example, a substrate- 
binding region can be used that interacts with a different substrate then that which is recognized by 
the native drug-metabolizing enzyme. Accordingly, a different set of signal transduction 
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components is available as an end-point assay for activation. This allows for assays to be performed 
in other than the specific host cell from which the drug-metabolizing enzyme is derived. 

The drug-metabolizing enzyme polypeptides are also useful in competition binding assays 
in methods designed to discover conq30unds that interact with the drug-metabolizing enzyme (e.g. 
5 binding partners and/or ligands). Thus, a compound is exposed to a drug-metabolizing enzyme 
polypeptide under conditions diat allow the compound to bind or to otherwise interact with the 
polypeptide. Soluble drug-metabolizing enzyme polypeptide is also added to the mixture. If the 
test compound interacts with the soluble drug-metabolizing enzyme polypeptide, it decreases the 
amount of complex formed or activity from the drug-metabolizing enzyme target This type of 
1 0 assay is particularly useful in cases in which compounds are sought that interact with specific 
regions of the drug-metabolizing enzyme. Thus, the soluble polypeptide that competes with the 
target drug-metabolizing enzyme region is designed to contain peptide sequences corresponding to 
the region of interest. 

To perform cell fi:^ drug screening assays, it is sometimes desirable to immobilize either 

15 the drug-metabolizing enzyme protein, or fragment, or its target molecule to facilitate separation of 
complexes from uncomplexed forms of one or both of the proteins, as well as to acconmiodate 
automation of the assay. 

Techniques for immobilizing proteins on matrices can be used in the drug screening assays. 
In one embodiment, a fusion protein can be provided which adds a domain that allows the protein to 

20 be bound to a matrix. For example, glutathione-S-transferase fusion proteins can be adsorbed onto 
glutathione sepharose beads (Sigma Chemical, St Louis, MO) or glutathione derivatized microtitre 
plates, which are then combined with the cell lysates (e.g., ^^S-labeled) and the candidate 
compound, and the mixture incubated under conditions conducive to complex formation (e.g., at 
physiological conditions for salt and pH). Following incubation, the beads are washed to remove 

25 any unbound label, and the matrix immobilized and radiolabel determined directly, or in the 

supernatant after the complexes are dissociated. Alternatively, the complies can be dissociated 
from the matrix, separated by SDS-PAGE, and the level of drug-metabolizing enzynie-binding 
protein found in the bead fraction quantitated from the gel using standard electrophoretic 
techniques. For example, either the polypeptide or its target molecule can be immobilized utilizing 

30 conjugation of biotin and streptavidin using techniques well known in the art. Alternatively, 

antibodies reactive with the protein but which do not interfere with binding of the protein to its 

target molecule can be derivatized to the wells of the plate, and the protein trapped in the wells by 

antibody conjugation. Preparations of a drug-metabolizing enzyme-binding protein and a candidate 
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compound are incubated in the drug-metabolizing enzyme protem-presenting wells and the amount 
of complex trapped in the well can be quantitated. Methods for detecting such complexes, in 
addition to those described above for the GST-immobilized complexes, include immunodetection of 
complexes using antibodies reactive with the drug-metabolizing enzyme protein target molecule, or 
S which are reactive with drug-metabolizing enzyme protein and compete with the target molecule, as 
well as enzyme-linked assays which rely on detecting an enzymatic activity associated with the 
target molecule. 

Agents that modulate one of the drug-metabolizing enzymes of the present invention can be 
identified using one or more of the above assays, alone or in combination. It is generally preferable 

10 to use a cell-based or cell free system first and then confirm activity in an animal or other model 
system. Such model systems are well known in the art and can readily be employed in this context. 

Modulators of drug-metabolizing enzyme protein activity identified according to these drug 
screening assays can be used to treat a subject with a disorder mediated by the dmg-metabolizing 
enzyme pathway, by treating cells or tissues that express the drug-metabolizing enzyme. 

15 Experimental data as provided in Figure 1 indicates expression in humans in the stomach, brain 
(including infant), endometrial tumors, prostate, kidney, adrenal gland tumors, head/neck, 
sympathetic trunk, breast, and hepatocellular carcinomas. These methods of treatment include the 
steps of administering a modulator of drug-metabolizing enzyme activity ki a pharmaceutical 
composition to a subject in need of such treatment, the modulator being identified as described 

20 herein. 

In yet another aspect of the invention, the drug-metabolizing enzyme proteins can be 
used as "bait proteins" in a two-hybrid assay or three-hybrid assay (see, e.g., U.S. Patent No. 
5,283,317; Zervos etal, (1993) Cell 72:223-232; Madura etal. (1993) J. BioL Chem. 268:12046- 
12054; Battel et al. (1993) Biotechniques 14:920-924; Iwabuchi et al (1993) Oncogene 8:1693- 

25 1696; and Brent WO94/10300), to identify other proteins, which bind to or interact with the 

drug-metabolizing enzyme and are involved in drug-metabolizing enzyme activity. Such drug- 
metabolizing enzyme-binding proteins are likely to be drug-metabolizing enzyme inhibitors. 

The two-hybrid system is based on the modular nature of most transcription factors, 
which consist of separable DNA-binding and activation domains. Briefly, the assay utilizes two 

30 different DNA constructs. In one construct, the gene that codes for a drug-metabohzing enzyme 

protein is fused to a gene encoding the DNA binding domain of a known transcription factor 

(e.g., GAL-4). In the other construct, a DNA sequence, fix>m a library of DNA sequences, that 

encodes an unidentified protein ("prey" or "sample") is fused to a gene that codes for the 
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activation domain of the known transcription factor. If the "bait" and the "prey" proteins are able 
to interact, in vivo, forming a drug-metabolizing enzyme-dependent complex, the DNA-binding 
and activation domains of the transcription factor are brought into close proximity. This 
proximity allows transcription of a reporter gene (e.g., LacZ) which is operably linked to a 
5 transcriptional regulatory site responsive to the transcription factor. Expression of the reporter 
gene can be detected and cell colonies containing the functional transcription factor can be 
isolated and used to obtain the cloned gene which encodes the protein which interacts with the 
drug-metabolizing enzyme protein. 

This invention further pertains to novel agents identified by the above-described 

1 0 screening assays. Accordingly, it is within the scope of this invention to further use an agent 
identified as described herein in an appropriate animal model. For example, an agent identified 
as described herein (e.g., a drug-metabolizing enzyme-modulating agent, an antisense drug- 
metabolizing enzyme nucleic acid molecule, a drug-metabolizing enzyme-specific antibody, or a 
drug-metabolizing enzyme-binding partner) can be used in an animal or other model to 

1 5 detennin.e the efficacy, toxicity, or side effects of treatment with such an agent. Alternatively, an 
agent identified as described herein can be used in an animal or other model to determine the 
mechanism of action of such an agent. Furthermore, this invention pertains to uses of novel 
agents identified by the above-described screening assays for treatments as described herein. 

The drug-metabolizing enzyme proteins of the present invention are also usefiil to provide a 

20 target for diagnosing a disease or predisposition to disease mediated by the peptide. Accordingly, 
the invention provides methods for detecting the presence, or levels of, the protein (or encoding 
mRNA) in a cell, tissue, or organism. Experimental data as provided in Figure 1 indicates 
expression in humans in the stomach, brain (including infant), endometrial tumors, prostate, kidney, 
adrenal gland tumors, head/neck, sympathetic trunk, breast, and hepatocellular carcinomas. The 

25 method involves contacting a biological sample with a compound capable of interacting with the 
drug-metabolizing enzyme protein such that the interaction can be detected. Such an assay can be 
provided in a single detection format or a multi-detection format such as an antibody chip array. 

One agent for detecting a protein in a sample is an antibody capable of selectively binding to 
protein. A biological sample includes tissues, cells and biological fluids isolated fi-om a subject, as 

30 well as tissues, cells and fluids present within a subject. 

The peptides of the present invention also provide targets for diagnosing active protein 

activity, disease, or predisposition to disease, in a patient having a variant peptide, particularly 

activities and conditions that are known for other members of the family of proteins to which the 
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present one belongs. Thus, the peptide can be isolated j&om a biological sample and assayed for the 
presence of a genetic mutation that results in aberrant peptide. This includes amino acid 
substitution, deletion, insertion, rearrangement, (as the result of aberrant splicing events), and 
inappropriate post-translational modification. Analytic methods include altered electrophoretic 
5 mobility, altere^^tryptic peptide digest, altered drug-metabolizing enzyme activity in cell-based or 
cell-^c^=^say, alteration in substrate or antibody-binding pattern, altered isoelectric point, direct 
amino acid sequencing, and any other of the known assay techniques useful for detecting mutations 
in a protein. Such an assay can be provided in a single detection format or a multi-detection format 
such as an antibody chip array. 

10 In vitro techniques for detection of peptide include enzyme linked immunosorbent assays 

(ELISAs), Western blots, immunoprecipitations and immunofluorescence using a detection reagent, 
such as an antibody or protein binding agent. Alternatively, the peptide can be detected in vivo in a 
subject by introducing into the subject a labeled anti-peptide antibody or other types of detection 
agent. For example, the antibody can be labeled with a radioactive marker whose presence and 

1 S location in a subject can be detected by standard imaging techiuques. Particularly useful are 

methods that detect the allelic variant of a peptide expressed in a subject and methods which detect 
fragments of a peptide in a sample. 

The peptides are also useful in phaimacogenomic analysis. Pharmacogenomics deal with 
clinically significant hereditary variations in the response to drugs due to altered dmg disposition 

20 and abnormal action in affected persons. See, e.g., Eichelbaum, M. (Clin. Exp. Pharmacol Physiol. 
23(10-1 1);983-985 (1996)), and Under, M.W, {Clin. Chem, 43(2):254-266 (1997)). The clinical 
outcomes of these variations result in severe toxicity of therapeutic drugs in certain individuals or 
therapeutic failure of dmgs in certain individuals as a result of individual variation in metabolism. 
Thus, the genotype of the individual can determine the way a therapeutic compound acts on the 

25 body or the way the body metabolizes the compound. Further, the activity of drug metabolizing 

enzymes effects both the intensity and duration of drug action. Thus, the pharmacogenomics of the 
individual permit the selection of effective compounds and effective dosages of such compounds for 
prophylactic or therapeutic treatment based on the individual's genotype. The discovery of genetic 
polymorphisms in some drug metabolizing enzymes has explained why some patients do not obtain 

30 the expected drug effects, show an exaggerated drug effect, or experience serious toxicity firom 
standard drug dosages. Polymorphisms can be expressed in the phenotype of the extensive 
metabolizer and the phenotype of the poor metabolizer. Accordingly, genetic polymorphism may 
lead to allelic protein variants of the drug-metabolizing enzyme protein in which one or more of the 
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drug-metabolizing enzyme functions in one population is different from those in another population. 
The peptides thus allow a target to ascertain a genetic predisposition thai can affect treatment 
modality. Thus, in a ligand-based treatment, polymorphism may give rise to amino terminal 
extracellular domains and/or other substrate-binding regions that are more or less active in substrate 
5 binding, and drug-metabolizing enzyme activation. Accordingly, substrate dosage would 

necessarily be modified to maximize the therapeutic effect within a given population containing a 
polymorphism. As an altemative to genotyping, specific polymorphic peptides could be identified. 

The peptides are also useful for treating a disorder characterized by an absence of, 
inappropriate, or unwanted expression of the protein. Experimental data as provided in Figure 1 
10 indicates expression in humans in the stomach, brain (including infant), endometrial tumors, 
prostate, kidney, adrenal gland tumors, head/neck, sympathetic trunk, breast, and hepatocellular 
carcinomas. Accordingly, methods for treatment include the use of the drug-metabolizing enzyme 
protein or fragments. 

15 Antibodies 

The invention also provides antibodies that selectively bind to one of the peptides of the 
present invention, a protein comprising such a peptide, as well as variants and fragments thereof. 
As used herein, an antibody selectively binds a target peptide when it binds the target peptide and 
does not significantiy bind to unrelated proteins. An antibody is still considered to selectively bind 

20 a peptide even if it also binds to other proteins that are not substantially homologous with the target 
peptide so long as such proteins share homology with a fragment or domain of the peptide target of 
the antibody. In this case, it would be understood that antibody binding to the peptide is still 
selective despite some degree of cross-reactivity. 

As used herein, an antibody is defined in terms consistent with that recognized within the 

25 art: they are multi-subunit proteins produced by a mammalian organism in response to an antigen 
challenge. The antibodies of the present invention include polyclonal antibodies and monoclonal 
antibodies, as well as fragments of such antibodies, including, but not limited to. Fab or F(ab')2, and 
Fv fragments. 

Many methods are known for generating and/or identifying antibodies to a given target 
30 peptide. Several such methods are described by Harlow, Antibodies, Cold Spring Harbor Press, 
(1989). 
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In general, to generate antibodies, an isolated peptide is used as an inununogen and is 
administered to a niEminiedian organism, such as a rat, rabbit or mouse. The fulHength protein, an 
antigenic peptide fragment or a fusion protein can be used. Particularly important fragments are 
those covering functional domains, such as the domains identified in Figure 2, and domain of 
5 sequence homology or divergence amongst the family, such as those that can readily be identified 
using protein alignment methods and as presented in the Figures. 

Antibodies are preferably prepared from regions or discrete fragments of the drug- 
metaboli2dng enzyme proteins. Antibodies can be prepared from any re^on of the peptide as 
described herein. However, preferred regions will include those involved in function/activity 

1 0 and/or drug-metabolizing enzyme/binding partner interaction. Figure 2 can be used to identify 
particularly important regions while sequence alignment can be used to identify conserved and 
unique sequence fragments. 

An antigenic fragment will typically comprise at least 8 contiguous amino acid residues. 
The antigenic peptide can comprise, however, at least 10, 12, 14, 16 or more amino acid residues. 

1 S Such fragments can be selected on a physical property, such as fragments correspond to regions that 
are located on the surface of the protein, e.g., hydrophilic regions or can be selected based on 
sequence uniqueness (see Figure 2). 

Detection on an antibody of the present invention can be facihtated by coupling (i.e., 
physically linking) the antibody to a detectable substance. Examples of detectable substances 

20 include various enzymes, prosthetic groups, fluorescent materials, luminescent materials, 
bioluminescent materials, and radioactive materials. Examples of suitable enzymes include 
horseradish peroxidase, alkahne phosphatase, |3-galactosidase, or acetylcholinesterase; examples of 
suitable prosthetic group complexes include streptavidin/biotin and avidin/biotin; examples of 
suitable fluorescent materials include umbelliferone, fluorescein, fluorescein isothiocyanate, 

25 rhodamine, jiichlorotriazinylamine fluorescein, dansyl chloride or phycoerythrin; an example of a 
luminescent material includes luminol; examples of bioluminescent materials include luciferase, 
luciferin, and aequorin, and examples of suitable radioactive material include *^I, ^^'l, ^^S or ^H. 

Antibody Uses 

30 The antibodies can be used to isolate one of the proteins of the present invention by standard 

techniques, such as affinity chromatography or immiinoprecipitation. The antibodies can facilitate 
the purification of the natural protein from cells and recombinantly produced protein expressed in 
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host cells. In addition, such antibodies are useful to detect the presence of one of the proteins of the 
present invention in cells or tissues to determine the pattern of expression of the protein among 
various tissues in an organism and over the course of normal development. Experimental data as 
provided in Figure 1 indicates that drug-metabolizing enzyme proteins of the present invention are 
S expressed in humans in the stomach, brain (including in&nt), endometrial tumors, prostate, kidney, 
adrenal gland tumors, head/neck, sympathetic trunk, breast, and hepatocellular carcinomas, as 
indicated by virtual northem blot analysis. PCR-based tissue screening panels also indicate 
expression in the brain. Furdier, such antibodies can be used to detect protein in situ, in vitro, or in 
a cell lysate or supernatant in order to evaluate the abxindance and pattern of expression. Also, such 

1 0 antibodies can be used to assess abnormal tissue distribution or abnormal expression during 

development or progression of a biological condition. Antibody detection of circulating fragments 
of the full length protein can be used to identify turnover. 

Further, the antibodies can be used to assess expression in disease states such as in active 
stages of the disease or in an individual with a predisposition toward disease related to the protein's 

15 function. When a disorder is caused by an inappropriate tissue distribution, developmental 

expression, level of expression of the protein, or expressed/processed form, the antibody can be 
prepared against the normal protein. Experimental data as provided in Figure 1 indicates expression 
in humans ia the stomach, brain (including infant), endometrial tumors, prostate, kidney, adrenal 
gland tumors, head/neck, sympathetic trunk, breast, and hepatocellular carciaomas. If a disorder is 

20 characterized by a specific mutation in the protein, antibodies specific for this mutant protein can be 
used to assay for the presence of the specific mutant protein. 

The antibodies can also be used to assess normal and aberrant subcelMar localization of 
cells in the various tissues in an organism. Experimental data as provided in Figure 1 indicates 
expression in humans in the stomach, brain (including infant), endometrial tumors, prostate, kidney, 

25 adrenal gland tumors, head/neck, sympathetic trunk, breast, and hepatocellular carcinomas. The 
diagnostic uses can be applied, not only in genetic testing, but also in monitoring a treatment 
modality. Accordingly, where treatment is ultimately aimed at correcting expression level or the 
presence of aberrant sequence and aberrant tissue distribution or developmental expression, 
antibodies directed against the protein or relevant fragments can be used to monitor therapeutic 

30 efEicacy. 

AdditionaUy, antibodies are usefid in pharmacogenomic analysis. Thus, antibodies prepared 
against polymorphic proteins can be used to identify individuals that require modified treatment 
modalities. The antibodies are also useful as diagnostic tools as an inmiunological marker for 
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aberrant protein analyzed by electrophoretic mobility, isoelectric point, tryptic peptide digest, and 
other physical assays known to those in the arL 

The antibodies are also useful for tissue typing. Experimental data as provided in Figure 1 
indicates expression in humans in &e stomach, brain (including in&nt), endometrial tumors, 
5 prostate, kidney, adrenal gland tumors, head/neck, sympathetic trunk, breast, and hepatocellular 
carcinomas. Thus, where a specific protein has been correlated with expression in a specific tissue, 
antibodies that are specific for this protein can be used to identify a tissue type. 

The antibodies are also usefiil for inhibiting protein function, for example, blocking the 
binding of tiie drug-metabolizing en2yme peptide to a binding partner such as a substrate. These 

1 0 uses can also be applied in a therapeutic context in which treatment involves inhibiting the protein's 
function. An antibody can be used, for example, to block binding, thus modulating (agonizing or 
antagonizing) the peptides activity. Antibodies can be prepared against specific fragments 
containing sites required for function or against intact protein that is associated vsdth a cell or cell 
membrane. See Figure 2 for structural information relating to the proteins of the present invention. 

1 5 The invention also encompasses kits for using antibodies to detect the presence of a protein 

in a biological sample. The kit can comprise antibodies such as a labeled or labelable antibody and 
a compound or agent for detecting protein in a biological sample; means for determining the amount 
of protein in the sample; means for comparing the amount of protein in the sample with a standard; 
and instructions for use. Such a kit can be supplied to detect a single protein or epitope or can be 

20 configured to detect one of a multitude of epitopes, such as in an antibody detection array. Arrays 
are described in detail below for nucleic acid arrays and similar methods have been developed for 
antibody arrays. 

Nucleic Acid Molecules 

25 The present invention fiirther provides isolated nucleic acid molecules that encode a drug- 

metabolizdng enzyme peptide or protein of the present invention (cDN A, transcript and genomic 
sequence). Such nucleic acid molecules will consist of, consist essentially of, or comprise a 
nucleotide sequence that encodes one of the drug-metabolizing enzyme peptides of the present 
invention, an allelic variant thereof, or an ortholog or paralog thereof. 

30 As used herein, an "isolated" nucleic acid molecule is one that is separated from other 

nucleic acid present in the natural source of the nucleic acid. Preferably, an "isolated" nucleic acid 

is fiiee of sequences that naturally flank the nucleic acid (i.e., sequences located at the S' and 3* ends 
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of the nucleic acid) in the genomic DNA of the organism firom which the nucleic acid is derived. 
However, there can be some flanking nucleotide sequences, for example up to about 5KB, 4KB, 
3KB, 2K£i, or 1KB or less, particularly contiguous peptide encoding sequences and peptide 
encoding sequences within the same gene but separated by introns in the genomic sequence. The 
S important point is that the nucleic acid is isolated from remote and unimportant flanking sequences 
such that it can be subjected tathe specific manipulations described herein such as recombinant 
expression, preparation of probes and primers, and other uses specific to the nucleic acid sequences. 

Moreover, an "isolated" nucleic acid molecule, such as a transcript/cDNA molecule, can be 
substantially free of other ceUular material, or culture medium when produced by recombinant 
1 0 techniques, or chemical precursors or other chemicals when chemically synthesized. However, the 
nucleic acid molecule can be fused to other coding or regulatory sequences and still be considered 
isolated. 

For example, recombinant DNA molecules contained in a vector are considered isolated. 
Further examples of isolated DNA molecules include recombinant DNA molecules maintained in 

1 5 heterologous host cells or purified (partially or substantially) DNA molecules in solution. Isolated 
RN A molecules include in vivo or in vitro RNA transcripts of the isolated DNA molecules of the 
present invention. Isolated nucleic acid molecules according to the present invention further include 
such molecules produced synthetically. 

Accordingly, the present invention provides nucleic acid molecules that consist of the 

20 nucleotide sequence shown in Figure 1 or 3 (SEQ ID NO: 1 , transcript sequence and SEQ ID N0:3, 
genomic sequence), or any nucleic acid molecule that encodes the protein proyided in Figure 2, 
SEQ ID NO:2. A nucleic acid molecule consists of a nucleotide sequence when the nucleotide 
sequence is the complete nucleotide sequence of the nucleic acid molecule. 

The present invention further provides nucleic acid molecules that consist essentially of the 

25 nucleotide sequence shown in Figure 1 or 3 (SEQ ID NO:l, transcript sequence and SEQ ID N0:3, 
genomic sequence), or any nucleic acid molecide that encodes the protein provided in Figure 2, 
SEQ ID NO:2. A nucleic acid molecule consists essentially of a nucleotide sequence when such a 
nucleotide sequence is present with only a few additional nucleic acid residues in the final nucleic 
acid molecule. 

30 The present invention further provides nucleic add molecules that comprise the nucleotide 

sequences shown in Figure 1 or 3 (SEQ ID NO:l, transcript sequence and SEQ ID NO:3, genomic 

sequence), or any nucleic acid molecule that encodes the protein provided in Figure 2, SEQ ID 

N0:2. A nucleic acid molecule comprises a nucleotide sequence when the nucleotide sequence is at 
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least part of the final nucleotide sequence of the nucleic acid molecule. In such a fashion, the 
nucleic acid molecule can be only the nucleotide sequence or have additional nucleic acid residues, 
such as nucleic acid residues that are naturally associated with it or heterologous nucleotide 
sequences. Such a nucleic acid molecule can have a few additional nucleotides or can comprises 
5 several hundred or more additional nucleotides. A brief description of how various types of these 
nucleic acid molecules can be readily made/isolated is provided below. 

In Figures 1 and 3, both coding and non-coding sequences are provided. Because of the 
source of the present invention, humans genomic sequence (Figure 3) and cDNA/transcript 
sequences (Figure 1 ), the nucleic acid molecules in the Figures will contain genomic intronic 

10 sequences, 5' and 3' non-coding sequences, gene regulatory regions and non-coding intergenic 
sequences. In general such sequence features are either noted in Figures 1 and 3 or can readily 
be identified using computational tools known in the art As discussed below, some of the non- 
coding regions, particularly gene regulatory elements such as promoters, are useful for a variety 
of purposes, e.g. control of heterologous gene expression, target for identifying gene activity 

1 5 modulating compounds, and are particularly claimed as fragments of the genomic sequence , 
provided herein. 

The isolated nucleic acid molecules can encode the mature protein plus additional amino or 
carboxyl-terminal amino acids, or amino acids interior to the mature peptide (when the mature form 
has more than one peptide chain, for instance). Such sequences may play a role in processing of a 

20 protein from precursor to a mature form, facilitate protein trafficking, prolong or shorten protein 
half-life or facilitate manipulation of a protein for assay or production, among other things. As 
generally is the case in situ, the additional amino acids may be processed away from the mature 
protein by cellular enzymes. 

As mentioned above, the isolated nucleic acid molecules include, but are not Umited to, the 

25 sequence encoding the drug-metabolizing enzyme peptide alone, the sequence encoding the mature 
peptide and additional coding sequences, such as a leader or secretory sequence (e.g., a pre-pro or 
pro-protein sequence), the sequence encoding the mature peptide, with or without the additional 
coding sequences, plus additional non-coding sequences, for example introns and non-coding S' and 
3' sequences such as transcribed but non-translated sequences that play a role in transcription, 

30 mRNA processing (including splicing and polyadenylation signals), ribosome binding and stability 
of mRNA. In addition, the nucleic acid molecule may be fused to a marker sequence encoding, for 
example, a peptide that facilitates pxirification. 
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Isolated nucleic acid molecules can be in the fonn of RNA, such as mRN A, or in the fonn 
DNA, including cDNA and genomic DNA obtained by cloning or produced by chemical sjoithetic 
techniques or by a combination thereof. The nucleic acid, especially DNA, can be double-stranded 
or single-stranded. Single-stranded nucleic acid can be the coding strand (sense strand) or the non- 
5 coding strand (anti-sense strand). 

The invention further provides nucleic acid molecules that encode fragments of the peptides 
of the present invention as well as nucleic acid molecules Ifaat encode obvious variants of the drug- 
metabolizing enzyme proteins of the present invention that are described above. Such nucleic acid 
molecules may be naturally occiarring, such as allelic variants (same locus), paralogs (different 

1 0 locus), and orfhologs (different organism), or may be constructed by recombinant DNA methods or 
by chemical synthesis. Such non-naturally occurring variants may be made by mutagenesis - 
techniques, including those applied to nucleic acid molecules, cells, or organisms. Accordingly, as 
discussed above, the variants can contain nucleotide substitutions, deletions, inversions and 
insertions. Variation can occur in either or both the coding and non-coding regions. The variations 

1 S can produce both conservative and non-conservative amino acid substitutions. 

The preset invention frirther provides non-coding fragments of the nucleic acid molecules 
provided in Figures 1 and 3. Preferred non-coding fragments include, but are not limited to, 
promoter sequences, enhancer sequences, gene modulating sequences and gene termination 
sequences. Such fragments are usefril in controlling heterologous gene expression and in 

20 developing screens to identify gene-modulating agents. A promoter can readily be identified as 
being 5' to the ATG start site in the genomic sequence provided in Figure 3. 

A fragment comprises a contiguous nucleotide sequence greater than 12 or more 
nucleotides. Further, a fragment could at least 30, 40, 50, 100, 250 or 500 nucleotides in length. 
The length of the fragment will be based on its intended use. For example, the fragment can encode 

25 epitope bearing regions of the peptide, or can be useful as DNA probes and primers. Such 

fragments can be isolated using the known nucleotide sequence to synthesize an oligonucleotide 
probe. A labeled probe can then be used to screen a cDNA library, genomic DNA library, or 
mRNA to isolate nucleic acid corresponding to the coding region. Further, primers can be used in 
PGR reactions to clone specific regions of gene. 

30 A probe/primer typically comprises substantially a purified oligonucleotide or 

oligonucleotide pair. The oligonucleotide typically comprises a region of nucleotide sequence that 
hybridizes under stripgent conditions to at least about 12, 20, 25, 40, 50 or more consecutive 
nucleotides. 
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Orthologs, homologs, and allelic variants can be identified using methods well known in the 
art. As described in the Peptide Section, these variants comprise a nucleotide sequence encoding a 
peptide that is typically 60-70%, 70-80%, 80-90%, and more typically at least about 90-95% or 
more homologoiis to tibe nucleotide sequence shown in the Figure sheets or a fragment of this 
S sequence. Such nucleic acid molecules can readily be identified as being able to hybridize under 
moderate to stringent conditions, to the nucleotide sequence shown in the Figure sheets or a 
fragment of the sequence. Allelic variants can readily be detennined by genetic locus of the 
encoding gene. The gene encoding the novel drug-metabolizing protein of the present invention is 
located on a genome component that has been mappbd to human chromosome 1 (as indicated in 

10 Figure 3), which is supported by multiple lines of evidence, such as STS and BAG map data. 

Figure 3 provides SNP information that has been found in the gene encoding the drug- 
metabolizing proteins of the present invention. SNPs, including insertion/deletion variants 
("indels"), were identified at 45 different nucleotide positions. Changes in the amino acid sequence 
caused by these SNPs can readily be determined using the universal genetic code and the protein 

1 5 sequence provided in Figure 2 as a reference. Positioning of each SNP in exons, introns, or outside 
the ORF can readily be determined using the DNA positions given for each SNP and the start/stop, 
exon, and intron coordinates given in the features. 

As used herein, the term "hybridizes under stringent conditions" is intended to describe 
conditions for hybridization and washing under which nucleotide sequences encoding a peptide at 

20 least 60-70% homologous to each other typically remain hybridized to each other. The conditions 
can be such that sequences at least about 60%, at least about 70%, or at least about 80% or more 
homologous to each other typically remain hybridized to each other. Such stringent conditions are 
known to those skilled in the art and can be found in Current Protocols in Molecular Biology^ John 
Wiley & Sons, N.Y. (1989), 6.3.1-6.3.6. One example of stringent hybridization conditions are 

* 

25 hybridization in 6X sodium chloride/sodium citrate (SSC) at about 45C, followed by one or more 
washes in 02 X SSC, 0.1% SDS at 50-65C. Examples of moderate to low stringency hybridization 
conditions are well known in the art. 

Nucleic Acid Molecule Uses 

30 The nucleic acid molecules of the present invention are useful for probes, primers, chemical 

intermediates, and in biological assays. The nucleic acid molecules are usefiil as a hybridization 
probe for messenger RNA, transcript/cDNA and genomic DNA to isolate full-length cDNA and 
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genomic clones encoding the peptide described in Figure 2 and to isolate cDNA and genomic 
clones that correspond to variants (alleles, orthologs, etc.) producing the same or related peptides 
shown in Figure 2. As illustrated in Figure 3, SNPs were identified at 45 different nucleotide 
positions. 

5 The probe can correspond to any sequence along the entire length of the nucleic acid 

molecules provided in the Figures. Accordingly, it could be derived from 5* noncoding regions, the 
coding region, and 3 ' noncoding regions. However, as discussed, fragments are not to be construed 
as encompassing fragments disclosed prior to the present invention. 

The nucleic acid molecules are also useful as primers for PGR to amplify any given region 

10 of a nucleic acid molecule and are useful to synthesize antisense molecules of desired length and 
sequence. 

The nucleic acid molecules are also useful for constructing recombinant vectors. Such 
vectors include expression vectors that express a portion of, or all of, the peptide sequences. 
Vectors edso include insertion vectors, \ised to integrate into another nucleic acid molecule 
1 5 sequence, such as into the cellular genome, to alter in situ expression of a gene and/or gene product. 
For example, an endogenous coding sequence can be replaced via homologous recombination with 
all or pzirt of the coding region containing one or more specifically introduced mutations. 

The nucleic acid molecules are also usefiil for expressing antigenic portions of the proteins. 
The nucleic acid molecules are also usefiil as probes for determining the chromosomal 
20 positions of the nucleic acid molecules by means of in situ hybridization methods. The gene 
encoding the novel drug-metabolizing protein of the present invention is located on a genome 
component that has been mapped to human chromosome 1 (as indicated in Figure 3), which is 
supported by multiple lines of evidence, such as STS and BAG map data. 

The nucleic acid molecules are also usefiil in making vectors containing the gene regulatory 
25 regions of the nucleic acid molecules of the present invention. 

The nucleic acid molecules are also usefiil for designing ribozymes corresponding to aU, or 
a part, of the mRNA produced from the nucleic acid molecules described herein. 

The nucleic acid molecules are also useful for making vectors that express part, or all, of the 
peptides. 

30 TTie nucleic acid molecules are also useful for constructing host cells expressing a part, or 

all, of the nucleic acid molecules and peptides. 

The nucleic acid molecules are also usefiil for constructing transgenic animals expressing 

all, or a part, of the nucleic acid molecules and peptides. 
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The nucleic acid molecules axe also useful as hybridization probes for determining the 
presence, level, form and distribution of nucleic acid expression. Experimental data as provided in 
Figure 1 indicates that drug-metaboli2dng enzyme proteins of the present invention are expressed in 
humans in the stomach, brain (including infant), endometrial tumors, prostate, kidney, adrenal gland 
5 tumors, head/neck, sympathetic trunk, breast, and hepatocellular carcinomas, as indicated by virtual 
northern blot analysis. PCR-based tissue screening panels also indicate expression in the brain. 
Accordingly, the probes can be used to detect the presence of, or to determine levels of, a specific 
nucleic acid molecule in cells, tissues, and in organisms. The nucleic acid whose level is 
detemiined can be DNA or RN A. Accordingly, probes corresponding to the peptides described 

10 herein can be used to assess expression and/or gene copy number in a given cell, tissue, or 

organism. These uses are relevant for diagnosis of disorders involving an increase or decrease in 
drug-metaboUzing enzyme protein expression relative to normal results. 

In vitro techniques for detection of mRNA include Northern hybridizations and in situ 
hybridizations. In vitro techniques for detecting DNA include Southern hybridizations and in situ 

15 hybridization. 

Probes can be used as a part of a diagnostic test kit for identifying cells or tissues that 
express a dmg-metaboliztng enzyme protein, such as by measuring a level of a drug-metabolizing 
enzyme-encoding nucleic acid in a sample of cells from a subject e.g., mRNA or genomic DNA, or 
determining if a drug-metabolizing enzyme gene has been mutated. Experimental data as provided 
20 in Figure 1 indicates that drug-metabolizing enzyme proteins of the present invention are expressed 
in humans in the stomach, brain (including infant), endometrial tumors, prostate, kidney, adrenal 
gland tumors, head/neck, sympathetic trunk, breast, and hepatocellular carcinomas, as indicated by 
virtual northern blot analysis. PCR-based tissue screening panels also indicate expression in the 
brain. 

25 Nucleic acid expression assays are useful for drug screening to identify compounds that 

modulate dmg-metabolizing enzyme nucleic acid expression. 

The invMtion thus provides a method for identifying a compound that can be used to treat a 
disorder associated with nucleic acid expression of the dmg-metabolizing enzyme gene, particularly 
biological and pathological processes that are mediated by the drug-metabolizing enzyme in cells 

* 

30 and tissues that express it. Experimental data as provided in Figure 1 indicates expression in humans 
in the stomach, brain (including infant), endometrial tumors, prostate, kidney, adrenal gland tumors, 
head/neck, sympathetic trunk, breast, and hepatocellular carcinomas. The method typicedly includes 
assaying the abiUty of the compound to modulate the expression of the dmg-metabolizing enzyme 
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nucleic acid and thus identifying a compound that can be used to treat a disorder characterized by 
undesired drug-metabolizing enzyme nucleic acid expression. The assays can be performed in cell- 
based and cell-firee systems. Cell-based assays include cells naturally expressing the drug- 
metabolizing enzyme nucleic acid or recombinant cells genetically engineered to express specific 

■ 

S nucleic acid sequences. 

Thus, modulators of drug-metabolizing enzyme gene expression can be identified in a 
method wherein a cell is contacted with a candidate compound and the expression of mRNA 
determined. The level of expression of dmg-metabolizing enzyme mRNA in the presence of the 
candidate compound is compared to the level of expression of dn^-metabolizing enzyme mRNA in 
1 0 the absence of the candidate compoimd. The candidate compound can then be identified as a 

* 

modulator of nucleic acid expression based on this comparison and be used, for example to treat a 
disorder characterized by aberrant nucleic acid expression. When expression of mRNA is 
statistically significantly greater in the presence of the candidate compound than in its absence, the 
candidate compound is identified as a stimxilator of nucleic acid expression. When nucleic acid 

1 S expression is statistically significantly less in the presence of the candidate compound than in its 
absence, the candidate compound is identified as an inhibitor of nucleic acid expression. 

The invention further provides methods of treatment, with the nucleic acid as a target, using 
a compound identified through drug screening as a gene modulator to modiilate drug-metabolizing 
enzyme nucleic acid expression in cells and tissues that express the drug-metabolizing enzyme. 

20 Experimental data as provided in Figure 1 indicates that drug-metabolizing enzyme proteins of the 
present invention are expressed in humans in the stomach, brain (including infent), endometrial 
tumors, prostate, kidney, adrenal gland tumors, head/neck, sympathetic trunk, breast, and 
hepatocellular carcinomas, as indicated by virtual northem blot analysis. PCR-based tissue 
screening panels also indicate expression in the brain. Modulation includes both up-regulation (i.e. 

25 activation or agonization) or down-regulation (suppression or antagonization) or nucleic acid 
expression. 

Alternatively, a modulator for drug-metabolizing enzyme nucleic acid expression can be a 
small molecule or drug identified using the screening assays described herein as long as the drug or 
small molecule inhibits the drug-metabolizing enzjone nucleic acid expression in the cells and 
30 tissues that express the protein. Experimental data as provided in Figure 1 Indicates mpression in 
himians in the stomach, brain (including infant), endometrial tumors, prostate, kidney, adrenal gland 
tumors, head/neck, sympathetic trunk, breast, and hepatocellular carcinomas. 
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The nucleic acid molecules are also useful for monitoring the effectiveness of modulating 
compoimds on the expression or activity of the drug-metabolizing enzyme gene in clinical trials or 
in a treatment regimen. Thus, the gene expression pattern can serve as a barometer for the 
continuing efifectiveness of treatment with the compound, particularly with compounds to which a 
5 patient can develop resistance. The gene expression pattern can also serve as a marker indicative of 
a physiological response of the affected cells to the compound. Accordingly, such monitoring 
woxild allow either increased administration of the compound or the administration of alternative 
compounds to which the patient has not become resistant. Similarly, if the level of nucleic acid 
expression falls below a desirable level, administration of the compound could be commensurately 
10 decreased. 

The nucleic acid molecules are also useful in diagnostic assays for qualitative changes in 
drug-metabolizing enzyme nucleic acid expression, and particularly in qualitative changes that lead 
to pathology. The nucleic acid molecules can be used to detect mutations in drug-metabolizing 
enzyme genes and gene expression products such as mRNA. The nucleic acid molecules can be 

1 5 used as hybridization probes to detect naturally occurring genetic mutations in the drug- 
metabolizing enzyme gene and thereby to determine whether a subject with tiie mutation is at risk 
for a disorder caused by the mutation. Mutations include deletion, addition, or substitution of one or 
more nucleotides in the gene, chromosomal rearrangement, such as inversion or transposition, 
modification of genonciic DN A, such as aberrant methylation patterns or changes in gene copy 

20 number, such as amplification. Detection of a mutated form of the drug-metabolizing enzyme gene 
associated with a dysfunction provides a diagnostic tool for an active disease or susceptibility to 
disease when the disease results firom overexpression, underexpression, or altered expression of a 
drug-metabolizing enzyme protein. 

Individuals carrying mutations in the drug-metabolizing enzyme gene can be detected at the 

25 nucleic acid level by a variety of techniques. Figure 3 provides SNP information that has been 

found in the gene encoding the drug-metabolizing proteins of the preset invention. SNPs, including 
insertion/deletion variants ("indels"), were identified at 45 different nucleotide positions. Changes 
in the amino acid sequence caused by these SNPs can readily be determined using the universal 
genetic code and the protein sequence provided in Figure 2 as a reference. Positioning of each SNP 

30 in exons, introns, or outside the ORF can readily be determined using the DNA positions given for 

each SNP and the start/stop, exon, and intron coordinates given in the features. The gene encoding 

the novel drug-metabolizing protein of the present invention is located on a genome component that 

has been mapped to human chromosome 1 (as indicated in Figure 3), which is supported by 
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multiple lines of evidence, such as STS and BAG map data. Genomic DNA can be analyzed 
directly or can be amplified by using PGR prior to analysis. RNA or cDNA can be used in the same 
way. In some uses, detection of the mutation involves the use of a probe/primer in a polymerase 
chain reaction (PGR) (see, e.g. U.S. Patent Nos. 4,683,195 and 4,683,202), such as anchor PGR or 

5 RAGE PGR, or, alternatively, in a ligation chain reaction (LGR) (see, e.g., Landegran et al.. Science 
24/:1077-1080 (1988); and Nakazawa et al, PNAS 91:360-364 (1994)), the latter of which can be 
particularly useful for detecting point mutations in the gene (see Abravaya et al , Nucleic Acids Res. 
23:675-682 (1995)). This method can include the steps of collecting a sample of cells from a 
patient, isolating nucleic acid (e.g., genomic, mRNA or both) from the cells of the sample, 

1 0 contacting the nucleic acid sample with one or more primers which specifically hybridize to a gene 
under conditions sudi that hybridization and amplification of the gene (if present) occurs, and 
detecting the presence or absence of an amplification product, or detecting the size of the 
amplification product and comparing the length to a control sample. Deletions and insertions can be 
detected by a change in size of the amplified product compared to the normal genotype. Point 

1 5 mutations can be identified by hybridizing amphfied DNA to normal RNA or antisense DNA 
sequences. 

Alternatively, mutations in a drug-metabolizing enzyme gene can be directly identified, for 
example, by alterations in restriction enzyme digestion patterns determined by gel electrophoresis. 

4 

Further, sequence-specific ribozymes (U.S. Patent No. 5,498,531) can be used to score for 

* 

20 the presence of specific mutations by development or loss of a ribozyme cleavage site. Perfectly 
matched sequences can be distinguished from mismatched sequences by nuclease cleavage 
digestion assays or by differences in melting temperature. 

Sequence changes at specific locations can also be assessed by nuclease protection assays 

■ 

such as RNase and SI protection or the chemical cleavage method. Furthermore, sequence 
25 differences between a mutant drug-metabolizing enzyme gene and a wild-type gene can be 

determined by direct DNA sequencing. A variety of automated sequmcing procedures can be 
utilized when performing the diagnostic assays (Naeve, C.W., (1995) Biotechniques 7P:448), 
including sequencing by mass spectrometry (see, e.g., PGT International Publication No. WO 
94/16101; Gohen et al.Adv, Chromatogr. 36:127-162 (1996); and GrifiSn etal.AppL Biochent 
30 Biotechnol. 35:147-159 (1993)). 

Other methods for detecting mutations in the gene include methods in which protection 
from cleavage agents is used to detect mismatched bases in RNA/RNA or RNA/DN A duplexes 
(Myers et al. Science 230:1242 (1985)); Gotton et al,, PNAS 85:4397 (1988); Saleeba et al, Meth 
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EnzymoL 277:286-295 (1992)), electrophoretic mobility of mutant and wild type nucleic acid is 
compared (Orita et al, PNAS 86:2166 (1989); Cotton et cd,, Mutat. Res, 255:125-144 (1993); and 
Hayashi et al. Genet, Anal, Tech. AppL 9:12-19 (1992)), and movement of mutant or wild-type 
fragments in polyacrylamide gels containing a gradient of denaturant is assayed using denaturing 
5 gradient gel electrophoresis (Myers et al , Nature 375:495 (1 985)). Examples of other techniques 
for detecting point mutations include selective oligonucleotide hybridization, selective 
amplification, and selective primer extension. 

The nucleic acid molecules are also useful for testing an individual for a genotype that while 
not necessarily causing the disease, nevertheless affects the treatment modality. Thus, the nucleic 
1 0 acid molecules can be used to study the relationship between an individual's genotype and the 

« 

individual's response to a compound used for treatment (pharmacogenomic relationship). 
Accordingly, the nucleic acid molecules described herein can be used to assess the mutation content 
of the drug-metabolizing enzyme gene in an individual in order to select an appropriate compound 
or dosage regimen for treatment. Figure 3 provides SNP information that has been found in the 

1 5 gene encoding the drug-metaboli:dng proteins of the present invention. SNPs, including 

insertion/deletion variants ("indels"), were identified at 45 dififerent nucleotide positions. Changes 
in the amino acid sequence caused by these SNPs can readily be determined using the universal 
genetic code and the protein sequence provided in Figure 2 as a reference. Positioning of each SNP 
in exons, introns, or outside the ORF can readily be determined using the DNA positions given for 

20 each SNP and the start/stop, exon, and intron coordinates given in the features. 

Thus nucleic acid molecules displaying genetic variations that affect treatment provide a 
diagnostic target that can be used to tailor treatment in an individual. Accordingly, the production 
of recombinant cells and animals containing these polymorphisms allow effective clinical design of 
treatment compounds and dosage regimens. 

25 The nucleic acid molecxiles are thus useful as antisense constructs to control drug- 

metabolizing enzyme gene expression in cells, tissues, and organisms. A DNA antisense nucleic 
acid molecule is designed to be complementary to a region of the gene involved in transcription, 
preventing transcription and hence production of drug-metabolizing enzyme protein. An antisense 
RNA or DNA nucleic acid molecule would hybridize to the mRNA and thus block translation of 

30 mRNA into drug-metabolizing enzyme protein. 

Alternatively, a class of antisense molecules can be used to inactivate mKNA in order to 
decrease expression of drug-metabolizing enzyme nucleic acid. Accordingly, these molecules can 
treat a disorder characterized by abnormal or imdesired drug-metabolizing enzyme nucleic acid 
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expression. This technique involves cleavage by means of ribozymes containing nucleotide 
sequences complementary to one or more regions in the mRNA that attenuate the ability of the 
mRN A to be translated. Possible regions include coding regions and particularly coding regions 
corresponding to the catalytic and other fimctional activities of the drug-metabolizing enzyme 
5 protein, such as substrate binding. 

The nucleic acid molecules also provide vectors for gene therapy in patients containing cells 
that are aberrant in drug-metabolizing enzyme gene expression. Thus, recombinant cells, which 
include the patient's cells that have been engineered ex vivo and returned to the patient, are 
introduced into an individual where the cells produce the desired drug-metabolizing enzyme protein 

1 0 to treat the individual. 

The invention also encompasses kits for detecting the presence of a drug-metabolizing 
enzyme nucleic acid in a biological sample. Experimental data as provided in Figure 1 indicates 
that drug-metabolizing enzyme proteins of the present invention are expressed in humans in the 
stomach, brain (including infant), endometrial tumors, prostate, kidney, adrenal gland tumors, 

1 5 head/neck, sympathetic trunk, breast, and hepatocellular carcinomas, as indicated by virtual 

northern blot analysis. PCR-based tissue screening panels also indicate expression in the brain. For 
example, the kit can comprise reagents such as a labeled or labelable nucleic acid or agent capable 
of detecting drug-metabolizing enzyme nucleic acid in a biological sample; means for determining 
the amount of drug-metabolizing enzyme nucleic acid in the sample; and means for comparing the 

20 amount of drug-metabohzing enzyme nucleic acid in the sample with a standard. The compound or 
agent can be packaged in a suitable container. The kit can further comprise instructions for using 
the kit to detect drug-metabolizing enzyme protein mRNA or DNA. 

Nucleic Acid Arravs 

25 The present invention further provides nucleic acid detection kits, such as arrays or 

microarrays of nucleic acid molecules that are based on the sequence information provided in 
Figures 1 and 3 (SEQ ID N0S:1 and 3). 

As used herein "Arrays" or "Microarrays" refers to an array of distinct polynucleotides or 
oligonucleotides synthesized on a substrate, such as paper, nylon or other type of membrane, 

30 filter, chip, glass slide, or £iny other suitable solid support. In one embodiment, the microarray is 
prepared and used according to the methods described in US Patent 5,837,832, Chee et aly PCT 
application W095/1 1995 (Chee et al\ Lockhart, D. J. et al (1996; Nat. Biotech. 14: 1675-1680) 

41 
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and Schena, M. et ai (1996; Proc. Natl. Acad. Sci. 93: 10614-10619), all of which are 
incorporated herein in their entirety by reference. In other embodiments, such arrays are 
produced by the methods described by Brown et al.^ US Patent No. 5,807,522. 

The microarray or detection kit is preferably composed of a large number of unique, 
5 single-stranded nucleic acid sequences, usually either synthetic antisense oligonucleotides or 
fragments of cDNAs, fixed to a solid support. The oligonucleotides are preferably about 6-60 
nucleotides in length, more preferably 1 5-30 nucleotides in length, and most preferably about 20- 
25 nucleotides in length. For a certain type of microarray or detection kit, it may be preferable to 
use oligonucleotides that are only 7-20 nucleotides in length. The microarray or detection kit 

1 0 may contain oligonucleotides that cover the known 5', or 3', sequence, sequential 

oligonucleotides that cover the fiill length sequence; or unique oligonucleotides selected from 
particular areas along the length of the sequence. Polynucleotides used in the microarray or 
detection kit may be oligonucleotides that are specific to a gene or genes of interest. 

In order to produce oligonucleotides to a known sequence for a microarray or detection 

1 5 kit, the gene(s) of interest (or an ORF identified firom the contigs of the present invention) is 
typically examined using a computer algorithm which starts at the 5' or at the 3' end of the 
nucleotide sequence. Typical algorithms will then identiiy oligomers of defined length that are 
unique to the gene, have a GC content within a range suitable for hybridization, and lack 
predicted secondary structure that may interfere with hybridization. In certain situations it may 

20 be appropriate to use pairs of oligonucleotides on a microarray or detection kit. The "pairs" will 
be identical, except for one nucleotide that preferably is located in the center of the sequence. 
The second oligonucleotide in the pair (mismatched by one) serves as a control. The number of 
oligonucleotide pairs may range from two to one million. The oligomers are synthesized at 
designated areas on a substrate using a light-directed chemical process. The substrate may be 

25 paper, nylon or other type of membrane, filter, chip, glass slide or any other suitable solid 
support. 

In another aspect, an oligonucleotide may be sjntithesized on the surface of the substrate 

by using a chemical coupling procedure and an ink jet application apparatus, as described in PCT 

application W095/251 116 (Baldeschweiler et aL) which is incorporated herein in its entirety by 

30 reference. In another aspect, a "gridded" array analogous to a dot (or slot) blot may be used to 

arrange and link cDNA fi-agments or oligonucleotides to the surface of a substrate using a 

vacuum system, thermal, UV, mechanical or chemical bonding procedures. An array, such as 

those described above, may be produced by hand or by using available devices (slot blot or dot 
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blot apparatus), materials (any suitable solid support), and machines (including robotic 
instruments), and may contain 8, 24, 96, 384, 1 536, 6144 or more oligonucleotides, or any other 
number between two and one million which lends itself to the efiRcient use of commercially 
available instrumentation. 
5 In order to conduct sample analysis using a microarray or detection kit, the RNA or DNA 

from a biological sample is made into hybridization probes. The mRNA is isolated, and cDNA is 
produced and used as a template to make antisense RNA (aRNA). The aRNA is amplified in the 
presence of fluorescent nucleotides, and labeled probes are incubated with the microarray or 
detection kit so that the probe sequences hybridize to complementary oligonucleotides of the 

1 0 microarray or detection kit. Incubation conditions are adjusted so that hybridization occurs with 
precise complementary matches or with various degrees of less complementarity. After removal 
of nonhybridized probes, a scanner is iised to determine the levels and patterns of fluorescence. 
The scanned images are examined to determine degree of complementarity and the relative 
abundance of each oligonucleotide sequence on the microarray or detection kit. The biological 

1 S samples may be obtained from any bodily fluids (such as blood, urine, saliva, phlegm, gastric 
juices, etc.), cultuied cells, biopsies, or other tissue preparations. A detection system may be 
used to measure the absence, presence, and amount of hybridization for all of the distinct 
sequences simultaneously. This data may be used for large-scale correlation studies on the 
sequences, expression patterns, mutations, variants, or polymorphisms among samples. 

20 Using such arrays, the present invention provides methods to identify the expression of 

the drug-metabolizing enzyme proteins/peptides of the present invention. In detail, such 
methods comprise incubating a test sample with one or more nucleic acid molecules and 
assaying for binding of the nucleic acid molecule wdth components within the test sample. Such 
assays will typically involve arrays comprising many genes, at least one of which is a gene of the 

25 • . present invention and or alleles of the drug-metabolizing enzyme gene of the present invention. 
Figure 3 provides SNP information that has been found in the gene encoding the drug- 
metabolizing proteins of the present invention. SNPs, including insertion/deletion variants 
C'indels")} were identified at 45 different nucleotide positions. Changes in the amino acid 
sequence caused by these SNPs can readily be determined using the universal genetic code and 

30 the protein sequence provided in Figure 2 as a reference. Positioning 6f each SNP in exons, 
introns, or outside the ORF can readily be determined using the DNA positions given for each 
SNP and the start/stop, exon, and intron coordinates given in the features. 
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Conditions for incubating a nucleic acid molecule with a test sample vary. Incubation 
conditions depend on the format employed in the assay, the detection methods employed, and the 
type and nature of the nucleic acid molecule used in the assay. One skilled in the art will 

■ 

recognize that any one of the commonly available hybridization, amplification or array assay 
5 formats can readily be adapted to employ the novel fragments of the Human genome disclosed 
herein. Examples of such assays can be foxmd in Chard, T, An Introduction to 
Radioimmunoassay and Related Techniques, Elsevier Science Publishers, Amsterdam, The 
Netherlands (1986); Bullock, G. R. et al. Techniques in Immunocytochemislry, Academic 
Press, Orlando, FL Vol. 1 (1 982), Vol. 2 (1983), Vol. 3 (1985); Tijssen, P., Practice and 

10 Theory of Enzyme Immunoassays: Laboratory Techniques in Biochemistry and Molecular 
Biology, Elsevier Science Publishers, Amsterdam, The Netherlands (1985), 

The test samples of the present invention include cells, protein or membrane extracts of 
cells. The test sample used in the above-described method will vary based on the assay format, 
nature of the detection method and the tissues, cells or extracts used as the sample to be assayed. 

15 Methods for preparing nucleic acid extracts or of cells are well known in the art and can be 
readily be adapted in order to obtain a sample that is compatible with the system utilized. 

In another embodiment of the present invention, kits are provided which contain the 
necessary reagents to carry out the assays of the present invention. 

Specifically, the invention provides a compartmentalized kit to receive, in close 

20 confinement, one or more containers which comprises: (a) a first container comprising one of the 
nucleic acid molecules that can bind to a fragment of the Himian genome disclosed herein; and 
(b) one or more other containers comprising one or more of the following: wash reagents, . 
reagents capable of detecting presence of a bound nucleic acid. 

In detail, a compartmentalized kit includes any. kit in which reagents are contained in 

25 separate containers. Such containers include small glass containers, plastic containers, strips of 
plastic, glass or paper, or arraying material such as silica. Such containers allows one to 
efficiently transfer reagents from one compartment to another compartment such that the 
samples and reagents are not cross-contaminated, and the agents or solutions of each container 
can be added in a quantitative fashion from one compartment to another. Such containers will 

30 include a container which will accept the test sample, a container which contains the nucleic acid 

probe, containers which contain wash reagents (such as phosphate buffered saline, Tris-bufTers, 

etc.), and containers which contain the reagents used to detect the bound probe. One skilled in 

the art will readily recognize that the previously unidentified drug-metabolizing enzyme gene of 
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the present invention can be routinely identified using the sequence information disclosed herein 
can be readily incorporated into one of the established kit formats v^ch are well knovm in the 
art, particularly expression arrays. 

5 Vectors/host cells 

The invention also provides vectors containing the nucleic acid molecules described herein. 

* 

The term "vector" refers to a vehicle, preferably a nucleic acid molecule, which can transport the 
nucleic acid molecules. When the vector is a nucleic acid molecule, the nucleic acid molecides are 
covalently linked to the vector nucleic acid. With this aspect of the invention, the vector includes a 

10 plasmid, single or double stranded phage, a single or double stranded RNA or DNA viral vector, or 
artificial chromosome, such as a BAC, PAC, YAC, OR MAC. 

A vector can be maintained in the host cell as an extrachromosomal element where it 
replicates and produces additional copies of the nucleic acid molecules. Alternatively, the vector 
may integrate into the host cell genome and produce additional copies of the nucleic acid molecules 

1 S when the host cell replicates. 

The invention provides vectors for the maintenance (cloning vectors) or vectors for 
expression (expression vectors) of the nucleic acid molecules. The vectors can function in 
prokaryotic or eukaryotic cells or in both (shuttle vectors). 

Expression vectors contain cis-acting regulatory regions that are operably linked in the 

20 vector to the nucleic acid molecules such that transcription of the nucleic acid molecules is allowed 
in a host cell. The nucleic acid molecules can be introduced into the host cell with a separate 
nucleic acid molecule capable of affecting transcription. Thus, the second nucleic acid molecule 
may provide a trans-acting factor interacting with the cis-regulatory control region to allow 
transcription of the nucleic acid molecules jfrom the vector. Alternatively, a trans-acting factor may 

25 be supplied by the host cell. Finally, a trans-acting factor can be produced from the vector itself. It 
is understood, however, that in some embodiments, transcription and/or translation of the nucleic 
acid molecules can occur in a cell-free system. 

The regulatory sequence to which the nucleic acid molecviles described herein can be 
operably linked include promoters for directing mKNfA transcription. These include, but are not 

30 limited to, the left promoter from bacteriophage X., the lac, TRP, and TAC promoters from E. coliy 
the early and late promoters from SV40, the CMV immediate early promoter, the adenovirus early 
and late promoters, and retrovirus long-terminal repeats. 
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In addition to control regions that promote transcription, expression vectors may also 
include regions that modulate transcription, such as repressor binding sites and enhancers. 
Examples include the S V40 enhancer, the cytomegalovirus immediate early enhancer, polyoma 
enhancer, adenovirus enhancers, and retrovirus LTR enhancers. 
5 In addition to containing sites for transcription initiation and control, expression vectors can 

also contain sequences necessary for transcription termination and, in the transcribed region a 
ribosome bindiag site for translation. Other regulatory control elements for expression include 
initiation and termination codons as well as polyadenylation signals. The person of ordinary skill in 
the art would be aware of the numerous regulatory sequences that are useful in expression vectors. 
1 0 Such regulatory sequences are described, for example, in Sambrook et al. Molecular Cloning: A 
Laboratory Manual. 2nd ed,. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 
(1989). 

A variety of expression vectors can be used to express a nucleic acid molecule. Such 
vectors include chromosomal, episomal, and virus-derived vectors, for example vectors derived 

1 5 from bacterial plasmids, from bacteriophage, from yeast episomes, from yeast chromosomal 
elements, including yeast artificial chromosomes, from viruses such as baculoviruses, 
papovaviruses such as SV40, Vaccinia viruses, adenoviruses, poxviruses, pseudorabies viruses, and 
retroviruses. Vectors may also be derived from combinations of these sources such as those derived 
from plasmid and bacteriophage genetic elements, e.g. cosnwds and phagemids. Appropriate 

20 cloning and expression vectors for prokaryotic and eukaryotic hosts are described in Sambrook et 
al^ Molecular Cloning: A Laboratory Manual 2nd. ed.. Cold Spring Harbor Laboratory Press, Cold 

Spring Harbor. NY, (1989). 

The regulatory sequence may provide constitutive expression in one or more host cells (i.e. 
tissue specific) or may provide for inducible expression in one or more cell types such as by 
25 temperature, nutrient additive, or exogenous factor such as a hormone or other ligand. A variety of 
vectors providing for constitutive and inducible expression in prokaryotic and eukaryotic hosts are 
well known to those of ordinary skill in the art 

The nucleic acid molecules can be inserted into the vector nucleic acid by well-known 
methodology. Generally, the DN A sequence that will ultimately be expressed is joined to an 
30 expression vector by cleaving the DNA sequence and the expression vector with one or more 
restriction enzymes and then ligating the fragments together. Procedures for restriction enzyme 
digestion and ligation are well known to those of ordinary skill in the art. 
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The vector containing the ^propriate nucleic acid molecule can be introduced into an 
appropriate host cell for propagation or expression using well-known techniques. Bacterial cells 
include, but are not limited to, E. coli, Streptomyces, and Salmonella typhimurium. Eukaryotic cells 
include, but are not limited to, yeast, insect cells such as Drosophila^ animal cells such as COS and 
5 CHO cells, and plant cells. 

As described herein, it may be desirable to express the peptide as a fusion protein. 
Accordingly, the invention provides fusion vectors that allow for the production of the peptides. 
Fusion vectors can increase the expression of a recombinant protein, increase the solubility of the 
recombinant protein, and aid in the purification of the protein by acting for example as a ligand for 

1 0 affinity purification. A proteolytic cleavage site may be introduced at the jimction of the fusion 
moiety so that the desired peptide can ultimately be separated firom the fusion moiety. Proteolytic 
enzymes include, but are not limited to, factor Xa, thrombin, and enterokinase. Typical fiision 
expression vectors include pGEX (Smith et al. Gene (57:31-40 (1988)), pMAL (New England 
Biolabs, Beverly, MA) and pRIT5 (Pharmacia, Piscataway, NJ) which fuse glutathione S- 

15 transferase (GST), maltose E binding protein, or protein A, respectively, to the target recombinant 
protein. Examples of suitable inducible non-fusion E. coli expression vectors include pTrc (Amann 
et aL, Gene 69:301-3 15 (1988)) and pET 1 Id (Studier et ai. Gene Expression Technology: Methods 
inEnzymology i5J:60-89 (1990)). 

Recombinant protein expression can be maximized in host bacteria by providing a genetic 

20 background wherein the host cell has an impaired capacity to proteolytically cleave the recombinant 
protein. (Gottesman, S., Gene Expression Technology: Methods in Enzymology 185, Academic 
Press, San Diego, California (1990) 1 1 9-128). Alternatively, the sequence of the nucleic acid 
molecule of interest can be altered to provide preferential codon usage for a specific host cell, for 
example E, coli, (Wada et ai. Nucleic Acids Res, 20:21 1 1-21 18 (1992)). 

25 The nucleic acid molecules can also be expressed by expression vectors that are operative in 

yeast. Examples of vectors for expression in yeast e.g., iS cerevisiae include pYepSecl (Baldari, et 
al.EMBOl 5:229-234 (1987)), pMFa (Kurjan et al. Cell 30:933-943(1982)), pJRY88 (Schultz et 
al. Gene 54:1 13-123 (1987)), and pYES2 (Invitrogen Corporation, San Diego, CA). 

The nucleic acid molecules can also be expressed in insect cells using, for example, 

30 baculovirus expression vectors. Baculovirus vectors available for expression of proteins in cidtured 
insect cells (e.g., Sf 9 cells) include the pAc series (Smith et al , Mol Cell Biol 3:21 56-2165 
(1983)) and the pVL series (Lucklow et aL, Virology J 70:31-39 (1989)). 
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In certain embodiments of the invention, the nucleic acid molecules described herein are 
expressed in manunalian cells using mammalian expression vectors. Examples of manunalian 
expression vectors include pCDM8 (Seed, B. Nature J2P:840(1987)) and pMT2PC (Kaufinan et al, 
EMBOJ, 5:187-195 (1987)). 
5 The expression vectors listed herein are provided by way of example only of the well- 

known vectors available to those of ordinary skill in the art that would be useful to express the 
nucleic acid molecules. The person of ordinary skill in the art would be aware of other vectors 
suitable for maintenance propagation or expression of the nucleic acid molecules described herein. 
These are found for example in Sambrook, J., Fritsh, E. F., and Maniatis, T. Molecular Cloning: A 

1 0 Laboratory Manual, 2rui ed. Cold Spring Harbor Laboratory, Cold Spring Harbor Laboratory 
Press, Cold Spring Harbor, NY, 1 989, 

The invention also encompasses vectors in which the nucleic acid sequences described 
herein are cloned into the vector in reverse orientation, but operably linked to a regulatory sequence 
that permits transcription of antisense RNA. Thus, an antisense transcript can be produced to all, or 

IS to a portion, of the nucleic acid molecule sequences described herein, including both coding and 
non-coding regions. Expression of this antisense RNA is subject to each of the parameters 
described above in relation to expression of the sense RNA (regulatory sequences, constitutive or 
inducible expression, tissue-specific expression). 

The invention also relates to recombinant host cells containing the vectors described herein. 

20 Host cells therefore include prokaryotic cells, lower eukaryotic cells such as yeast, other eukaryotic 
cells such as insect cells, and higher eukaryotic cells such as mammalian cells. 

The recombinant host cells are prepared by introducing the vector constructs described 
herein into the cells by techniques readily available to the person of ordinary skill in the art. These 
include, but are not limited to, calcium phosphate transfection, DEAE-dextran-mediated 

25 transfection, cationic lipid-mediated transfection, electroporation, transduction, infection, 

lipofection, and other techniques such as those found in Sambrook, et al {Molecular Cloning: A 
Laboratory Manual 2nd, ed. Cold Spring Harbor Laboratory, Cold Spring Harbor Laboratory 
Press, Cold Spring Harbor, NY, 1989). 

Host cells can contain more than one vector. Thus, different nucleotide sequences can be 

30 introduced on different vectors of the same cell. Similarly, the nucleic acid molecules can be 

introduced either alone or with other nucleic acid molecules that are not related to the nucleic acid 
molecules such as those providing trans-acting factors for expression vectors. When more than one 
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vector is introduced into a cell, the vectors can be introduced independently, co-introduced or joined 
to the nucleic acid molecule vector. 

In the case of bacteriophage and viral vectors, these can be introduced into cells as packaged 
or encapsulated virus by standard procedures for infection and transduction. Viral vectors can be 
5 replication-competent or replication-defective. In the case in which viral replication is defective, 
replication will occur in host cells providing functions that complement Hic defects. 

Vectors generally include selectable markers that enable the selection of the subpopulation 
of cells that contain the recombinant vector constructs. The marker can be contained in the same 
vector that contains the nucleic acid molecules described herein or may be on a separate vector. 
10 Markers include tetracycline or ampicilUn-resistance genes for prokaryotic host cells and 

dihydrofoiate reductase or neomycin resistance for eukaryotic host cells. However, any marker that 
provides selection for a phenotypic trait will be effective. 

While the mature proteins can be produced in bacteria, yeast, mammalian cells, and other 
cells under the control of the appropriate regulatory sequences, cell- free transcription and 
1 5 translation systems can also be used to produce these proteins using RN A derived from the DN A 
constructs described herein. 

Where secretion of the peptide is desired, appropriate secretion signals are incorporated into 
the vector. The signal sequence can be endogenous to the peptides or heterologous to these 
peptides. 

20 , Where the peptide is not secreted into the medium, the protein can be isolated fix)m the host 
cell by standard disruption procedures, including freeze thaw, sonication, mechanical disruption, use 
of lysing agents and the like. The peptide can then be recovered and purified by well-known 
purification methods including aixunonium sulfate precipitation, acid extraction, anion or cationic 
exchange chromatography, phosphocellulose chromatography, hydrophobic-interaction 

25 chromatography, aJSinity chromatography, hydroxylapatite chromatography, lectin chromatography, 
or high performance liquid chromatography. 

It is also und^stood that depending upon the host cell in recombinant production of the 
peptides described herein, the peptides can have various glycosylation patterns, depending upon the 
cell, or maybe non-glycosylated as when produced in bacteria. In addition, the peptides may 

30 include an initial modified methionine in some cases as a result of a host-mediated process. 
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Uses of vectors and host cells 

The recombinant host cells expressing the peptides described herein have a variety of uses. 
First, the cells are useful for producing a drug-metabolizing enzyme protein or peptide that can be 
further purified to produce desired amounts of drug-metabolizing enzyme protein or fragments. 

. 5 Thus, host cells containing expression vectors are useful for peptide production. 

Host cells are also useful for conducting cell-based assays involving the drug-metabolizing 
enzyme protein or drug-metabplizing enzyme protein fragments, such as those described above as 
well as other formats known in the art. Thus, a recombinant host cell expressing a native drug- 
metabolizing enzyme protein is useful for assaying compounds that stimulate or inhibit drug- 

1 0 metabolizing enzyme protein function. 

Host cells are also useful for identifying drug-metabolizing enzyme protein mutants in 
which these functions are affected. If the mutants naturally occur and give rise to a pathology, host 
cells containing .the mutations are useful to assay compounds that have a desired effect on the 
mutant drug-metabolizing enzyme protein (for example, stimulating or inhibiting function) which 

1 5 may not be indicated by their effect on the native drug-metabolizing enzyme protein. 

Genetically engineered host cells can be further used to produce non-human transgenic 
animals. A transgenic animal is preferably a mammal, for example a rodent, such as a rat or mouse, 
in which one or more of the cells of the animal include a transgene. A transgene is exogenous DNA 
which is integrated into the genome of a cell from >Adiich a transgenic animal develops and which 

20 remains in the genome of the mature animal in one or more cell types or tissues of the transgenic 
animal. These animals are useful for studying the function of a drug-metabolizing enzyme protein 
and identifying and evaluating modulators of drug-metabolizing enzyme protein activity. Other 
examples of transgenic animals include non-human primates, sheep, dogs, cows, goats, chickens, 
and amphibians. 

25 A transgenic animal can be produced by introducing nucleic acid into the male pronuclei of 

a fertilized oocyte, e.g., by microinjection, retroviral infection, and allovmig the oocyte to develop 
in a pseudopregnant female foster animal. Any of the drug-metabolizing enzyme protein nucleotide 
sequences can be introduced as a transgene into the genome of a non-hviman animal, such as a 
mouse. 

30 Any of the regulatory or other sequences useful in expression vectors can form part of the 

transgenic sequence. This includes intronic sequences and polyadenylation signals, if not already 
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included. A tissue-specific regulatory sequence(s) can be operably linked to the transgene to direct 
expression of the drug*metabolizing enzyme protein to particular cells. 

Methods for generating transgenic animals via embryo manipulation and microinjection, 
particularly animals such as mice, have become conventional in the art and are described, for 
5 example, in U.S. Patent Nos. 4,736,866 and 4,870,009, both by Leder et al,\3.S. Patent No. 
4,873,1 91 by Wagner et al and in Hogan, B., Manipulating the Mouse Embryo^ (Cold Spring 
Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1 986). Similar methods are used for 
production of other transgenic animals. A transgenic founder animal can be identified based upon 
the presence of the transgene in its genome and/or expression of transgenic mRNA in tissues or 

10 cells of the animals. A transgenic foxinder animal can then be used to breed additional animals 
carrying the transgene. Moreover, transgenic animals carrying a transgene can fvirther be bred to 
other transgenic animals carrying other transgenes. A transgenic animal also includes animals in 
which the entire animal or tissues in the animal have been produced using the homologously 
recombinant host cells described herein. 

15 In another embodiment, transgenic non-human animals can be produced which contain 

selected systems that allow for regulated expression of the transgene. One example of such a 
S3rstem is the cre/loxP recombinase system of bacteriophage PI . For a description of the cre/loxP 
recombinase system, see, e.g., Lakso et al PNAS 5P:6232-6236 (1992). Another example of a 
recombinase system is the FLP recombinase system of iS. cerevisiae (O' Gorman et al Science 

20 257:1351-1355 (1991). If a cre/loxP recombinase system is used to regulate expression of the 

transgene, animals containing transgenes encoding both the Cre recombinase and a selected protein 
is required. Such animals can be provided through the construction of "double" transgenic animals, 
e.g., by mating two transgenic animals, one containing a transgene encoding a selected protein and 
the other containing a transgene encoding a recombinase. 

25 Clones of the non-human transgenic animals described herein can also be produced 

according to the methods described in Wihnut, L et al Nature 355:810-813 (1997) and PCX 
International Publication Nos. WO 97/07668 and WO 97/07669. In brief, a cell, e.g., a somatic cell, 
from the transgenic animal can be isolated and induced to exit the growth cycle and enter Go phase. 
The quiescent cell can then be fused, e.g., through the use of electrical pulses, to an enucleated 

30 oocyte from an animal of the same species from which the quiescent cell is isolated. The 
reconstructed oocyte is then cultured such that it develops to morula or blastocyst and then 
transferred to pseudopregnant female foster animal. The offspring bom of this female foster animal 
will be a clone of the animal from which the cell, e.g., the somatic cell, is isolated. 

51 



wo 02/34922 



PCT/USO 1/42528 



Transgenic animals containing recombinant cells that express the peptides described herein 
arc useful to conduct the assays described herein in an in vivo contact. Accordingly, the various 
physiological factors that are present in vivo and that could effect substrate binding, drug- 
metabolizing enzyme protein activation, and signal transduction, may not be evident from in vitro 
cell-free or cell-based assays. Accordingly, it is useful to provide non-human transgenic animals to 
assay in vivo drug-metabolizing enzyme protein function, including substrate interaction, the effect 
of specific mutant drug-metabohzing enzyme proteins on drug-metabolizing enzyme protein 
function and substrate interaction, and the effect of chimeric drug-metabolizing enzyme proteins. It 
is also possible to assess the effect of null mutations, that is mutations that substantially or 
completely eliminate one or more drug*metabolizing enzyme protein functions. 

All publications and patents mentioned in the above specification are herein incorporated 
by reference. Various modifications and variations of the described method and system of the 
invention will be apparent to those skilled in the art without departing from the scope and spirit 
of the invention. Although the invention has been described in connection with specific 
preferred embodiments, it should be understood that the invention as claimed should not be 
unduly limited to such specific embodiments. Indeed, various modifications of the above- 
described modes for carrying out the invention which are obvious to those skilled in the field of 
molecular biology or related fields are intended to be within the scope of the following claims. 
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Claims 

That which is claimed is: 

1 . An isolated peptide consisting of an amino acid sequence selected Jfrom the group 
consisting of: 

(a) an amino acid sequence shown in SEQ ID N0:2; 

(b) an amino acid sequence of an allelic variant of an amino acid sequence 
shown in SEQ ID N0:2, wherein said allelic variant is encoded by a nucleic acid molecule that 
hybridizes under stringent conditions to the opposite strand of a nucleic acid molecule shown in 
SEQIDNOS:! or 3; 

(c) an amino acid sequence of an ortholog of an amino acid sequence shown in 
SEQ ID NO:2, wherein said ortholog is encoded by a nucleic acid molecule that hybridizes Imder 
stringent conditions to the opposite strand of a nucleic acid molecule shown in SEQ ED NOS:l or 3; 
and 

(d) a firagment of an amino acid sequence shown in SEQ ID N0:2, wherein said 
fragment comprises at least 10 contiguous amino acids. 

2. An isolated peptide comprising an amino acid sequence selected from the group 
consisting of: 

(a) an amino acid sequence shown in SEQ ID NO:2; 

(b) an amino acid sequence of an allelic variant of an amino acid sequence 
shown in SEQ ID NO:2, wherein said allelic variant is encoded by a nucleic acid molecule that 
hybridizes under stringent conditions to the opposite strand of a nucleic acid molecule shown in 
SEQIDNOS:! or 3; 

(c) an amino acid sequence of an ortholog of an amino acid sequence shown in 
SEQ ID NO:2, wherein said ortholog is encoded by a nucleic acid molecule that hybridizes under 
stringent conditions to the opposite strand of a nucleic acid molecule shown in SEQ ID NOS:l or 3; 
and 

(d) a fragment of an amino acid sequence shown in SEQ ID NO:2, wherein said 
fragment comprises at least 10 contiguous amino acids. 

3. An isolated antibody that selectively binds to a peptide of claim 2. 
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4. An isolated nucleic acid molecule consisting of a nucleotide sequence selected from 
the group consisting of: 

(a) a nucleotide sequence that encodes an amino acid sequence shown in SEQ 

ID N0:2; 

(b) a nucleotide sequence that encodes of an allelic variant of an amino acid 
sequence shown in SEQ ID NO:2, wherein said nucleotide sequence hybridizes under stringent 
conditions to the opposite strand of a nucleic acid molecule shown in SEQ ID NOS: 1 or 3; 

(c) a nucleotide sequence that encodes an ortholog of an amino acid sequence 
shown in SEQ ID NO:2, wherein said nucleotide sequence hybridizes under stringent conditions to 
the opposite strand of a nucleic acid molecule shown in SEQ ID NOS: 1 or 3; 

(d) a nucleotide sequeiice that encodes a fragment of an amino acid sequence 
shown in SEQ IDNO:2, wherein said fragment comprises at least 10 contiguous amino acids; and 

(e) a nucleotide sequence that is the complement of a nucleotide sequence of 

(aHd). 

5. An isolated nucleic acid molecule comprising a nucleotide sequence selected from 

■ 

the group consisting of: 

(a) a nucleotide sequence that encodes an amino acid sequence shown in SEQ 

IDNO:2; 

(b) a nucleotide sequence that encodes of an allehc variant of an amino acid 
sequence shown in SEQ ID NO:2, wherein said nucleotide sequence hybridizes under stringent 
conditions to the opposite strand of a nucleic acid molecule shown in SEQ ID NOS: 1 or 3; 

(c) a nucleotide sequence that encodes an ortholog of an amino acid sequence 
shown in SEQ ID NO:2, wherein said nucleotide sequence hybridizes under stringent conditions to 
the opposite strand of a nucleic acid molecule shown in SEQ ED NOS: 1 or 3; 

(d) a nucleotide sequence that encodes a fragment of an amino acid sequence 
shown in SEQ ID NO:2, wherein said fragment comprises at least 1 0 contiguous amino acids; and 

(e) a nucleotide sequence that is the complement of a nucleotide sequence of 

(a)-(d). 

6. A gene chip comprising a nucleic acid molecule of claim 5. 
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7. 



A transgenic non-humatn animal comprising a nucleic acid molecule of claim 5. 



8. 



A nucleic acid vector comprising a nucleic acid molecule of claim 5. 



9. 



A host cell containing the vector of claim 8. 



10. A method for producing any of the peptides of claim 1 comprising mtroducing a 
nucleotide sequence encoding any of the amino acid sequences in (a)-(d) into a host cell, and 
culturing the host cell xmder conditions in which the peptides are expressed from the nucleotide 
sequence. 

11. A method for producing any of the peptides of claim 2 comprising introducing a 
nucleotide sequence encoding any of the amino acid sequences in (a)-(d) into a host cell, and 
culturing the host cell under conditions in which the peptides are expressed fiom the nucleotide 
sequence. 

12. A method for detecting the presence of any of the peptides of claim 2 in a sample, 
said method comprising contacting said sample with a detection agent that specifically allows 
detection of the presence of the peptide in the sample and then detecting the presence of the peptide. 

13. A method for detecting the presence of a nucleic acid molecule of claim 5 in a 
sample, said method comprising contacting the sample with an ohgonucleotide that hybridizes to 
said nucleic acid molecule imder stringent conditions and determining whether the oligonucleotide 
binds to said nucleic acid molecule in the sample. 

14. A method for identifying a modulator of a peptide of claim 2, said method 
comprising contacting said peptide with an agent and determining if said agent has modxxlated the 
function or activity of said peptide. 

1 5. The method of claim 14, wherein said agent is administered to a host cell comprising 
an expression vector that expresses said peptide. 
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16. A method for identifying an agent that binds to any of the peptides of claim 2, said 
method comprising contacting the peptide with an agent and assaying the contacted mixture to 
detemiine whether a complex is fomied with the agent bound to the peptide. 

17. A phannaceutical composition comprising an agent identified by the method of 
claim 1 6 and a phamiaceutically acceptable carrier therefor. 

18. A method for treating a disease or condition mediated by a human drug- 
metabolizing enzjone protein, said method comprising administering to a patient a pharmaceutically 
effective amount of an agent identified by the method of claim 16. 

19. A method for identifying a modulator of the expression of a peptide of claim 2, said 
method comprising contacting a cell expressing said peptide wilfa an agent, and determining if said 
agent has modulated fhe expression of said peptide. 

20. An isolated human drug-metabolizing enzyme peptide having an amino acid 
sequence that shares at least 70% homology with an amino acid sequence shown in SEQ ID N0:2. 

21 . A peptide according to claim 20 that shares at least 90 percent homology vnOi an 
amino acid sequence shown in SEQ ID NO:2. 

22. An isolated nucleic acid molecule encoding a human drug-metabolizing enzyme 
peptide, said nucleic acid molecule sharing at least 80 percent homology with a nucleic acid 
molecule shown in SEQ ID NOSil or 3. . 

23. A nucleic acid niolcculc according to claim 22 that shares at least 90 percent 
homology with a nucleic acid molecule showoi in SEQ ID NOS:l or 3. 
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2201 


TTTTATATGT 


AGAAATGTAG 


2251 


ATATTGTTAT 


TGATTTTTTT 


2301 


AAAAAAAAAA 


AAAAAAAAAA 


(SEQ ID ] 


NO:l) 





AGGCCTGAGC TGCCCCTCCC ACTGCCTTTC 
GCTTCGCGAG GGCCCAGAGA GGCGGTGGGG 
CCGGGCGGGA GAAAGCCCAC CCTCTCCCGC 
TTCGGCGCTG CGCAGAGCCA TGGAATTCTC 
CGCGGCCCTT TTACCTGGCG TTCGTGTTCT 
CAGGCCATTA AGCTGTACCT GCGGAGGCAG 
CCCCTTCCCA GCGCCCCCCA CCCACTGGTT 
TTCAGGATGA TAACATGGAG AAGCTTGAGG 
CGTGCCTTCC CTTTCTGGAT TGGGCCCTTT 
TGACCCAGAC TATGCAAAGA CACTTCTGAG 
GGTACCTGCA GAAATTCTCA CCTCCACTTC 
CTAGACGGAC CCAAGTGGTT CCAGCATCGT 
CCATTTTAAC ATCCTGAAAG CATACATTGA 
AAATGATGCT GGATAAGTGG GAGAAGATTT 
GTGGAGGTCT ATGAGCACAT CAACTCGATG 
ATGCGCTTTC AGCAAGGAGA CCAACTGCCA 
CTTATGCAAA AGCCATATTT GAACTCAGCA 
TACAGTTTGT TGTATCACAG TGACATAATT 
CTACCGCTTC CAGAAGTTAA GCCGAGTGTT 
TAATCCAGGA AAGAAAGAAA TCCCTCCAGG 
ACTCCGAAGA 6GAAGTACCA GGATTTTCTG 
GGATGAAAGT GGTAGCAGCT TCTCAGATAT 
GCACATTCCT GTTGGCAGGA CATGACACCT 
ATCCTTTACT GCCTGGCTCT GAACCCTGAG 
GGAGGTCAGG GGCATCCTGG GGGATGGGTC 
TGGGTGAGAT GTCGTACACC ACAATGTGCA 
ATTCCTGCAG TCCCGTCCAT TTCCAGAGAT 
CCCAGATGGA TGCACATTGC CTGCAGGGAT 
GGGGTCTTCA CCACAACCCT GCTGCTGTCT 
GACCCCTTGA GGTTCTCTCA GGAGAATTCT 
CTACTTACCA TTCTCAGCTG GATCAAGGAA 
CCATGATTGA GTTAAAGGTA ACCATTGCCT, 
GTGACTCCAG ACCCCACCAG GCCTCTTACT 
CAAGCCCAAG AATGGGATGT ATTTGCACCT 
AGATCTCAGG GTACAATGAT TAAACGTACT 
TACAGCTAAT GATCCAAGCA GATAGAAAGG 
ATTGGAGGTT GGTGGGATAG GGGTCTCTGT 
CTAGGTACAC AGTGTGTCAG CTAGATCTGT 
TTTTCAGATC TTTTCTGTTA AACTTTCACT 
CAATAGACTT TCATATATTT TCTGTTGTTT 
ATGCAAGTAA TAAGTGCATG TATGCTCACT 
GAAAATCATG TAGAATAAAA ATTTTAAATC 
TCCATGCCCT GACCAATCCT ACTGCTTTTC 
TGTGCATTCT TTCAGACTTT TTCCTATACA 
CAATGTATTT GTATAGATGT GATCATTCCT 
CACTTAATAA AAATTCACCT TATTCCTTAA 
AAAAAAA 



FEATURES: 

5*UTR: 1-189 

Start Codon: 190 

Stop Codon: 1720 

3'UTR: 1723-2327 
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HozaoXogous pzro^eins : 
TOP 10 BIAST Hit3 



gi 
gi 
gi 
gi 
gi 
gi 
gi 
gi 
gi 



2117369 Ipir f I A29368 prostaglandin omega-hydroxylase {EC 1.14. 
117166|sp|P10611 1CP44_RABIT CYTOCHROME P450 4A4 (CYPIVA4) (P. 
l64981|gb|AAA31232.1| (J02818) cytochrome P-450p-2 [Oryctola. 
1656 1 emb lCAA404 93.il (X57209) omega-hydroxylase cytochrome . 
89989 Ipir I I A34260 laurate omega-hydroxylase- (EC 1.14.15.3) c. 
117167 jspl P14579 |CP45_RABIT CYTOCHROME P450 4A5 PRECURSOR (C. 
2037871gb|AAA41038.1| (M57718) cytochrome P-450 IVAl [Rattus. 
899921pir IIB34160 cytochrome P450 4A7 - rabbit >gi 1 164985 1 gb. 
3738263|dbj IBAA33804.1 I {AB018421) cytochrome P-450 [Mus mus, 
8393238 I ref 1NP_058695. 1 1 cytochrome P450, subfamily IVB, pel. 



BLAST to dbEST: 

gb|AW812435|AW812435 CMl-ST0181-261099-026-a02 ST0181 Homo sapi, 
gb|R56515 IR56515 yg94d06.rl Soares infant brain INIB Homo sapie. 
gb 1 7^337301 1 AA337301 EST42040 Endometrial tumor Homo sapiens cD, 
gb|AA65274 6|AA65274 6 ns65c09.sl NCI_CGAP_Pr22 Homo sapiens cDNA. 
gb|AA863360|AA863360 oh04f03,sl NCI_CGAP__Kid3 Homo sapiens cDNA. 
gb| AA319338 I AA319338 EST21550 Adrenal gland tumor Homo sapiens . 
gb|BF355963|BF355963 CMl-HT0878-060900-398-b08 HT0878 Homo sapi 
gb|BF445825|BF445825 nae41d04.xl Lupski_sympathetic_trunk Homo , 
gb|AA557324 iAA557324 nl81a02.sl NCI_CGAP_Br2 Homo sapiens cDNA 
gb|AV683266|AV683266 AV683266 GKC Homo sapiens cDNA clone GKCDQ 
gblAW264444 1AW264444 xr03d03.xl NCI_CGAP_Brn53 Homo sapiens cDN. 

EX£»RESSION FOR MODUIiATORY USE: 

library source: 

Expression information from BLAST dbEST hits: 



gb|AW812435 

gb|R56515| 

gb|AA337301 

gb|AA65274 6 

gb|AA863360 

gblAA319338 

gb|BF355963 

gb|BF44 5825 

gblAA557324 

gblAV683266 

gbtAW264444 



I Stomach 

Soares infant brain INIB 
1 Endometrial tumor 
I normal prostate 
I kidney 

I Adrenal gland tumor 
I head neck 

I Lupski_^sympathetic_triank 
I breast 

I hepatocellular carcinoma 
I brain 



Score 


E 


521 


e-146 


520 


e-146 


520 


e-14 6 


518 


e-14 6 


517 


e-145 


516 


e-145 


510 


e-143 


510 


e-143 


509 


e-143 


508 

■ «mf W W 


e-143 






. 1092 


0.0 


769 


0.0 


640 


0.0 


636 


e-180 


599 


e-168 


555 


e-155 


381 


e-103 


365 


5e-98 


357 


le-95 


323 


2e-85 


242 


5e-61 



Expression information from PCR-based tissue screening panels: 
Whole brain 



Figure lA 
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1 MEFSWLETRW ARPFYLAFVF CLALGLLQAI KLYLRRQRLL RDLRPFPAPP 

51 THWFLGHQKF IQDDNMEKLE EIIEKYPRAF PFWIGPFQAF FCIYDPDYAK 

101 TLLSRTDPKS RYLQKFSPPL LGKGLAALDG PKWFQHRRLL TPGFHFNILK 

151 AYIEVMAHSV KMMLDKWEKI CSTQDTSVEV YEHINSMSLD IIMKCAFSKE 

201 TNCQTNSTHD PYAKAIFELS KIIFHRLYSL LYHSDIIFKL SPQGYRFQKL 

251 SRVLNQYTDT IIQERKKSLQ AGVKQDNTPK RKYQDFLDIV LSAKDESGSS 

301 FSDIDVHSEV STFLLAGHDT lAASISWILY CLALNPEHQE RCREEVRGIL 

351 GDGSSITWDQ LGEMSYTTMC IKETCRLIPA VPSISRDLSK PLTFPDGCTL 

4 01 PAGITWLSI WGLHHNPAAV WKNPKVFDPL RFSQENSDQR HPYAYLPFSA 

4 51 GSRNCIGQEF AMIELKVTIA LILLHFRVTP DPTRPLTFPN HFILKPKNGM 

501 YLHLKKLSEC 



FEATURES : 

Functional domains and key regions: 
[1] PDOCOOOOl PSOOOOl ASN_GLYCOSYLATION 
N-glycosylation site 

206-209 NSTH 

[2] PDOC00004 PS00004 CAMP_PHOSPHO_SITE 

cAMP- and cGMP-dependent protein kinase phosphorylation site 
Number of matches: 2 

1 265-268 RKKS 

2 505-508 KKLS 

13] PDOC00005 PSOOO'05 PKC__PHOSPHO_SITE 
Protein kinase C phosphorylation site 
Number of matches: 4 

1 159-161 SVK 

2 278-280 TPK 

3 292-294 SAK 

4 374-376 TCR 

[4] PDOC00006 PS00006 CK2_PHOSPHO_SITE 
Casein kinase II phosphorylation site 
Number of matches: 9 



1 




4-7 


SWLE 


2 


104- 


107 


SRTD 


3 


172- 


175 


STQD 


4 


176- 


179 


TSVE 


5 


207- 


210 


STHD 


6 


292- 


295 


SAKD 


7 


300- 


303 


SFSD 


8 


302- 


305 


SDID 


9 


393- 


396 


TFPD 



[5] PDOC00008 PS00008 MYRISTYL 
N-myristoylation site 
Number of matches: 5 

1 25-30 GLLQAI 

2 298-303 GSSFSD 

3 353-358 GSSITW 

4 451-456 GSRNCI 

5 457-4 62 GQEFAM 

[6] PDOC00081 PS00086 CYTOCHROME_P450 ' 
Cytochrome P4 50 cysteine heme-iron ligand signature 

448-457 FSAGSRNCIG 
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Membrane spanning stmictuire and domains 



Helix Begin End 

1 12 32 

2 76 96 

3 316 336 

4 395 415 



Score Certainty 
1.638 Certain 
1.029 Certain 
1.077 Certain 
1.443 Certain 



BLAST Alignment to Top Hit: 

>gi»2117369|pir| IA29368 prostaglandin omega-hydroxylase (EC 1.14.15.-) 

cytochrome P450 4A4 - rabbit 
Length = 510 

Score = 521 bits (1328), Expect = e-146 

Identities =246/493 (49%), Positives = 355/493 (71%), Gaps = 1/493 (0%) 
Frame = +1 



Query: 235 LAFVFCLALGLLQAIKLYLRRQRLLRDLRPFPAPPTHWFLGHQKFIQDDN-MEKLEEIIE 411 

+A + L L LL+A +LYL RQ LLR L+ FP PP HW LGH + Q+D +E++++ +E 
Sbjct: 21 VAALLGLLLLLLKAAQLYLHRQWLLRALQQFPCPPFHWLLGHSREFQNDQELERIQKWVE 80 

Query: 412 KYPRAFPFWIGPFQAFFCIYDPDYAKTLLSRTDPKSRYLQKFSPPLLGKGLAALDGPKWF 591 

K+P A P+W+ +A +YDPDY K +L R+DPK+ K P +G GL LOG WF 
Sbjct : 81 KFPGACPWWLSGNKARLLVYDPDYLKVILGRSDPKAPRNYKLMTPWIGYGLLLLDGQTWF 140 

Query: 592 QHRRLLTPGFHFNILKAYIEVMAHSVKMMLDKWEKICSTQDTSVEVYEHINSMSLDIIMK 771 

QHRR+LTP FH++ILK Y+ +M SV++MLD+WE++ S QD+S+E+++H++ M+LD IMK 
Sbjct:. 141 QHRRMLTPAFHYDILKPYVGLMVDSVQIMLDRWEQLIS-QDSSLEIFQHVSLMTLDTIMK 199 

* 

Query: 772 CAFSKETNCQTNSTHDPYAKAIFELSKIIFHRLYSLLYHSDIIFKLSPQGYRFQKLSRVL 951 

CAFS + + Q + Y +AI +L+ ++F+R ++ + SD +++LSP+G F + ++ 

Sbj ct : 200 CAFSYQGSVQLDRNSHSYIQAINDLNNLVFYRARNVFHQSDFLYRLSPEGRLFHRACQLA 259 

Query : 952 NQYTDTIIQERKKSLQAGVKQDNTPKRKYQDFLDIVLSAKDESGSSFSDIDVHSEVSTFL 1131 

+++TD +IQ+RK LQ + + +++ DFLD++L AK E+GSS SD D+ +EV TF+ 
Sbjct: 260 HEHTDRVIQQRKAQLQQEGELEKVRRKRRLDFLDVLLFAKMENGSSLSDQDLRAEVDTFM 319 

Query: 1132 LAGHDTLAASISWILYCLALNPEHQERCREEVRGILGDGSSITWDQLGEMSYTTMCIKET 1311 

GHDT A+ +SWI Y LA +PEHQ RCREE++G+LGDG+SITW+ L +M YTTMCIKE 
Sbjct : 320 FEGHDTTASGVSWIFYALATHPEHQHRCREEIQGLLGDGASITWEHLDQMPYTTMCIKEA 37 9 

Query: 1312 CRLIPAVPSISRDLSKPLTFPDGCTLPAGITWLSIWGLHHNPAAVWKNPKVFDPLRFSQ 1491 

RL P VPS++R LSKP+TFPDG +LP G+ + LSI+GLH+NP VW+NP+VFDP RF+ 
Sbjct : 380 LRLYPPVPSVTRQLSKPVTFPDGRSLPKGVILFLSIYGLHYNP-KVWQNPEVFDPFRFAP 438 

Query: 14 92 ENSDQRHPYAYLPFSAGSRNCIGQEFAMIELKVTIALILLHFRVTPDPTRPLTFPNHFIL 1671 

+++ H +A+LPFS G+RNCIG++FAM ELKV +AL LL F + PDPTR +L 
Sbjct : 4 39 DSA — YHSHAFLPFSGGARNCIGKQFAMRELKVAVALTLLRFELLPDPTRVPIPIARVVL 4 96 

Query: 1672 KPECNGMYLHLKKL 1710 

K KNG++L L+KL 
Sbjct: 4 97 KSKNGIHLRLRKL 509 



Hmmer seairch results (Pfam) : 

Model Description 

PF00067 Cytochrome P450 

CE00363 E00363 glycine_receptor_beta 



Score S-value N 

416.5 2.5e-121 1 
2.1 4.7 1 



Parsed for domains : 

Model Domain seq-f seq-t hmm~f hrom~t score E-value 

CE00363 1/1 210 233 .. 481 504 .] 2.1 4.7 

PF00067 1/1 46 504 1 497 [] 416.5 2.5e-121 
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1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 



CCAGCCTCTC TTAGGCTCCT AAATATAGTG CAAAAAGTTC CAGAGTTCCT 
TTGTTACCCA TGAAAGCACA TGGAACGGTG CTGGACAGGG GCAACTGGCC 
CTGGAGCAGA GGAGTAACTG CATAGAACTG TCCAAGCCTC AGAGGGAGTC 
ACACCACCAG CAAGAACCTG GGTGGGAGTA GGTGAGCCAA GGGGTTCCCA 
GGCTCTGACC CTGCCAAGAG AACTCATTAG AAGGTCACCA ACCACACATA 
CTATTCCTCG GTCTCATGAA GAACCCAGGG ACCGGACCAG GCAAGATATC 
ACAAAGCTGA AGTTTCAGCT CTGGGGCAGA GCATGGATCT GAGGTCTTTG 
GCCCTACCAC CATGCGATCA TATGAGGGCC ATCATACAAC CATCATGATT 
TGGGGGAGGA ATAGGGCATA GAGGAATCAT ATGAAAAGCT GAAATGCCAT 
GAGTTACCCA GAAGAAGCTG TGTAAGCCAG AGGATTCTGA GACCCTGTCA 
AATAACAACA TCTAGTTGAA GGTTGGAGTT AGGTAGGAGG TAGGGAAGTC 
TGGGAAAGAA GGAGCTGAAA CACTTGCTGT GTGTGGCTTA ATGGAACATG 
CAAGGGGCCA GGACGAACTT GGTCCAGATG AAGTCACCAC CCCCTGGGGC 
CTGTCTTTTT TTTTTTTTTT tTTTTTTTTT TGAGACGGAG TCTCACTCTG 
TCACCAGGCT GGAGTGCAGT GGCGCGATCT CGGCTCACTG CAATCTTTGC 
CTCTCGGGTT CAAGCGATTC TCCTGCCTCA GCCTCCTGAG tagctgggat 

tacaggcgcg cgccaccacg cccagctaat tttagtactg ttagtagaga 
tggggtttca ccatcttggc caggatggtc ttgatccctt gacctcgtga 
tccgcccgcc tcggcctccc aaattgctgg gattacaggc gtgagccacc 
gcgcccggcc ccctggagcc tgtcttaatc acttacccgc caaataaaat 
ctggctccag agagtggagc gtaggcttaa ggaattgggg gcggaagggc 
ggggaaggtg ggggagggac agtgataggg agaacaggga attgtagcag 

AAATTGGGTT TATTGTTCAG AGCTGTCAAT GAACACTTAA CATATGCCTG 
TCTTAGCCTA AATCAATGAA TAAATGAATG AATAAATAAA TGAATGAAAT 
GTGGGCAATG CCTATAAAGA TTGCTGGGAC AGGGAGGTGG GGGGAGACAC 
CAGCTTGGGA AGTCAGGCCT GTTAGATCCT AGTTCACCAC , CTGATACGTT 
ACAAATACTA AAACCATCAC TTTCAAATTA TTTTTACTAC ATTTTCCTGT 
TATCTGTACT CGAGTTTATT TATGTTTCTG GCATCTAGAG TCAGCCCTTC 
ATGGGCATGA GACCCAAGCA GCCACACGAG GCTCTGAACC CAGAAGAGCA 
TATGCTCGGT TTAATGGTCT GTCATCTTAG AATTGTTAAT AAAGTTTTTA 
TCCCGCATTT TCATTTTGCA CTGAGATTCA TAAATTATAT AGCAGGCCCT 
GACTGTACCT GTATAGTGGA ATTACTATAT GATGGTACGC TACTGTGCAT 
ATCTTCCCCG TTCAGTGTTC AGTGCCCTCG TATCGGCAGC TTGAACTAGC 
TCATGGTACA CGCTGGGAAT CAGGGTGGGA ATCAGTTGTA AACCATTTAC 
CGGAACACCA CTAGGCAGGC CACAGGATAA AGGAATAATG ATGGTACACC 
TCCCCCTACC TCTACCACCT GGGAATTTTG GTAGAATGCC AGAATGGAAA' 
AGAAAATCTC TTGCATAGCC ATTTATAATT TGTGATAAGG AAGAAAAACA 
ATGACCTCAG CTTTAGCATT ATTTTACAAT ATAAATTCAG ATCCCGTGAC 
TGAAAACTGT TGGACTTAAA AGAGGACGCT CCAGGAGCGC AAAAGCAGTT 
GGGCCGAACG AAGCGTGCGC GCTTTGGTAA CCGGCTAGAA ATCCCGCACG 
CGCGCCTGCC TCCTCTCCCC AGGCCTGAGC TGCCCCTCCC ACTGCCTTTC 
CTTCTTCCCG CGAGTCAGAA GCTTCGCGAG GGCCCAGAGA GGCGGTGGGG 
GTGGGCGACC CTACGCCAGC TCCGGGCGGG AGAAAGCCCA CCCTCTCCCG 
CGCCCCATGA AACCGCCGGC GTTCGGCGCT GCGCAGAGCC ATGGAATTCT 
CCTGGCTGGA GACGCGCTGG GCGCGGCCCT TTTACCTGGC GTTCGTGTTC 
TGCCTGGCCC TGGGGCTGCT GCAGGCCATT AAGCTGTACC TGCGGAGGCA 
GCGGCTGCTG CGGGACCTGC GCCCCTTCCC AGCGCCCCCC ACCCACTGGT 
TCCTTGGGCA CCAGAAGGTA AATGGAAGGG AAAAAGGNTA GAAAAGGAGG 
AAGAGGGGGG CGGAGGAGGA TGCGGCAGAG GAGCCCAGCC GGCAGAGAGA 
CGCAGCTTTC TTCCATCCCT GGGGACCCTC CGGCTTGCAC CGGCCTTTCC 
AGCCCGGCCT GTGGCTCTTA GCATCATTTT TCCTTGCTCT GGAGAATTGC 
TTTCCCGCAG CCCCACAGGG AAAGGTCACA AAAGAGGAAG CTTTGGGGGC 
TGGGAGAGAG CTATTTAAAG AACCTGAATA TGGAAAAAGA AAGCGAGCTG 
TAACTCAAGT CTGTCTCTCA TTGCTTCACC AAGCCTTCCA CATGTGTTGC 
TTTAAAAATA GCATGTTATT CTAAATAACT TATTAGTTGC AGAAAATATG 
CAAAATCTAT CCCAATCGTT GGCACCCTTA GTCCATTTTA ACAAGAGAAA 
ATTTTCTTTT CCTAAGATTC TTGTGAAGTA AGGAGCAGCC CCAGCCAGCC 
ACTCGAGAAA TACTGATTGA TGGAAATTTG TAAAGGGAGA CTGTTAGCTT 
TTGGTCTCTC CCGTTTTTTA AATCCACTCC CACCCCTAAT TAAGGTTTTT 
ATTCATTCAA CCGACTCTGA GTGGCAATTG TGTGATAGGT ACTAAGATTA 
CAAAGAGAAG CTAAGTCCCT CCCCTGCACC ACCCAAGTCA GGTGCAGACT 
TAGGCCACAG AGAGAAAATG AAAATTTAAG GCAATGGGTG CTTTACTAGA 
GGCCTAGAGA CAAGGGAATA TCTGTCGGAG GAAAGTATAC ATCTCCGCCT 
AGAGAAGGAA GGAAAGTCTG TGAAGGGCTG AGCAGAGTCT TAAAGGATGG 
TTGGGTGGTG TGGGGAAGGC ATTCCAGCAG AGCTACTACA CGATCCTTTG 
GTTTCCCCAC TTTCTAGTCT TTCTTATATA AAGCAACCAC TTTCAACTCT 
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3301 


TTTATCGGTT 


3351 


CATATTGCAT 


3401 


TATGGTTTGT 


3451 


GTTGGAGGTG 


3501 


TCATGAATGT 


3551 


GTCCCTCTTC 


3601 


TCCCCTCTCT 


3651 


CCTGTTCACT 


3701 


CAGATGCCAA 


3751 


TAATCCTCTT 


3801 


AAATGTACTA 


3851 


ACTTACACTT 


3901 


TTAAATAGAA 


3951 


TACTGCTGAT 


4001 


ACTTTTTCCC 


4051 


NNNNNNNNNN 


4101 


NNNNNNNNNN 


4151 


NNNNNNNNNN 


4201 


NNNNNNNNNN 


4251 


CTATGTCGTC 


4301 


TCAATTGACC 


4351 


CATTGATCTA 


4401 


GGTTACCAAA 


4451 


AAGGTTAATG 


4501 


TGATAGCATA 


4551 


TAAAGAGTGT 


4 601 


GATGGGTACC 


4 651 


ATCAAAAACA 


4701 


ACAAGTATTA 


4751 


AGTACCACAC 


4801 


GTATGAGTCC 


4851 


GAATCCTTGA 


4901 


AGGAAACCAG 


4951 


TTTGGGGATT 


5001 


ACAGAAAGCC 


5051 


TTTTTTTGTT 


5101 


CAGTGACGCA 


5151 


ATTCTCCTGC 


5201 


CACACCAACT 


5251 


GGCCAGGCTA 


5301 


CCCAAAGTGC 


5351 


ATTTTTTTTA 


5401 


CTTTTGTTAA 


5451 


TTATTGCATT 


5501 


AAGTTTTTTT 


5551 


CAGAGATGAC 


5601 


TATCAGATCA 


5651 


CTAAGGCTCA 


5701 


CTACATAATA 


5751 


CATACAATGT 


5801 


TAACATTTAT 


5851 


CTTCAGAAAT 


5901 


ACTTATTCCT 


5951 


TTACAGAAGG 


6001 


TTCATTCCTT 


6051 


CCTCCCTTGG 


6101 

w .L V/ ^ 


G C AAAT TAT T 


6151 


ACCTCTAGTA 


6201 


TCACTCTCTG 


6251 


CACCACTAAC 


6301 


ACTATAAACC 


6351 


TAAATCCTTT 


6401 


TATTCAAAGA 


6451 


ATTCAATAGC 


6501 


GTGTATGTGT 


6551 


ATCTATGTAG 



TCTTCTGGTA TTTAAATACT TATTTGTAAA ATAGTATTAC 
CTATTAATTT AATAAGTTTA GACATCTGCT GTGGTTTAGA 
TCGTCCCCAC CAAGCGTCAT GTTGAAATTT GATTCCCAAT 
GGATCTGATG GGAGATCTTT GGGTCATTGG GATGGATCCC 
CTTGGTGCAG CTGTCTCCTT CATAAGTTCT CACTCTCTTA 
AACCCCCAGA ACTGATTGTT GAAAAGAGCC TGCCACCTCC 
CTTCCTGTCT CTCACCATGT GGTCTCTGCA CACAACTGCT 
TCCACTATGA GTGGAAGCAG TCTGAGATCC TCCGCAGATG 
TGCCATGCTT CTTGTACAGC CTGCAGAATT GTAACCCAAA 
TGTGAATGAC CCAGCCTCAG GTATTCCTTT ACAGCAACAC 
AGACAACATC CACCTATGAA CTTCTTTATG ACAGGCAATC 
CATATTCCAC TGTCCCAGTA ACTATATAGT ATTGTATTTT 
AAACTTCTAT TTGTATTATT TTTATTATGC AAATGTTATT 
CTAAATGGTC CTCTTTCATT TTATTTCCTT TTCTCATAGA 
CACCCCCACA GTATTGNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NTGTTATGTA TCTCTACTGT CTCATGAATA 
TGTTGTTTTA ATTGAATTGT TTTGGCATCC TTGTCAAAAA 
ATAAATGTCA AGGTCTATTT CTGAGTCTTC AATTCTAATC 
TATGTCTATC CTAACTCATG GACACAGAGA GTAGAAGGAT 
GGCTGGGAAG GATAGAGGGG AGCTGGGGGA GGAGGTAGGG 
GGTACAAAAA AAATAGAAAG AATGAATAAC ACCTACTATT 
GCAGGGTGGC TATAGTCAAT AATAACTGTA CACTTTTAAA 
AATAGGATTG TTTGCAACTC AATGGATAAA TGCTTGAGGG 
CCATTCTTCA TGATGTGCCT ATTTCACATT GCATGCCTGT 
TCTCATTTAC TCCATAAATA TATACACCTA CTATGTATCC 
AAAATTATAA ATAAATAAAT TATATAGCTA TCCTTATGCT 
TGCCTTACTG TTGCTTTGTA GTAAGCTTTG AAATCAGGAA 
CCCGCACTTT GGTATTTTCC AAGATTATTT TGGCTGTTTG 
TTTCTATACA AATTTTAGAC TCAGCCTATC AATTTCTACA 
CTAGGGTTCT GCTTGGGATT GCACTGAATC , TGTAGATGAG 
ATTGCCATCT TAAGAATATT AGGTCTTCTG ATCCATGAAC 
TTTCCGTTTA GTTAGGTCAT CTTTAATTTT TTTTGTTGTT 
TTTTGAGACA GAGTCCTGCT CTGTCGCCCA GGCTGGAGTG 
ATCTCGGCTC ACTGCAACCT CCGCCTCTCG GATTCAAGCG 
CTCAGCCTCC CAAGCAGCTG GGACTACAGG CACATGCCAC 
AATTTTTGTA TTTTCAGTAG AGACGGGGTT TCACCATATT 
GTCTCGAACT CCTGACCTCG TGATCCACCC GCCTCACCCT 
TGGGATTACA GGCGTGAGCC ACCACTCCCG GCTTTCTTTA 
ACGATGTTTT TGTATTTTTC AAAGTATACA TCTTGCATTT 
ATTTATTTGT TTTGTTCTTT TTAATTTCAT TTCAGACTAT 
CATAGTGTTT TAGAGTCCAC ATTCCCTCTT GACTGTCACT 
TTTTCTGTTT TTGAGAGGTT TCTATCAGAA TTTTGCAGAT 
GGACATGTCA AACTGTCTAA TATTACCAAC CCTCCCCATT 
GGATCCTTTT GGTGATTCAC CATGCAGGGA AATCTAGTAT 
AAAGGTGATA CTGTTTTACA TAGGCAGTAA CATTTTATTG 
ACTACATATT TATGGAGTAC CTGTGATATT TTGATACGTG 
GCAGTGATCA AATCAGGGTG TTTAGGGTAT TCATCACTTC 
TATTTATTTG TGTTTGGAAC ATTTCAAGTC TCTTCAAGCT 
ATTCAATACA TTATTGTTAA CAGTGCTATT GAACACTGGA 
TCTATCTAAA GACAGTAACA TTTTAAGTAT AGTCATAAGG 
ATAAAGTGTG TATAGGGAAA ATTCCCTACA AGATGAGAAT 
ACTCTTAGTA ATACAGGTCT TCAAACATGC CAAGGATATT 
AGCTTTGAAC ATGCACGTCT GTGGTTATAT TGCTCTCCCT 
CCTAAAAGAG GCTTGCCCTG ACCATTCAGA CTAAAATAGC 
CTCTCTATCT CCAACCCTAT TATTATTATC TTGGCCCTTA 
ACACTATACT GTATACTCTT TTGCTTGTTC GTTTATTATC 
TACAATATAA AATCTGTGAG AGGTAGGATC TTTGTTTGCC 
TAGTGCATGG TACAGTTCCT GGTGCATAAT AGGTGCTCAA 
GTTGAATGCA TAAATATATT AGGTGCTGAG AAAATTTATT 
TCAATTTACT GCATAGAATA GGCCAGGTGG TTTGACATTT 
CAACATATGG GACCTAGGAT GTACATATGC AAGTGTGTGT 
GTGTGCATCT GCATGTGTAC TTGGATGTAC TGCAGAGAAC 
CTAAGTAGTA TAAAGCACTT GGGCTCCAGA GTTAAACTGG 
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6601 

6651 

6701 

6751 

6801 

6851 

6901 

6951 

7001 

7051 

7101 

7151 

7201 

7251 

730X 

7351 

7401 

7451 

7501- 

7551 

7601 

7651 

7701 

7751 

7801 

7851 

7901 

7951 

8001 

8051 

8101 

8151 

8201 

8251 

8301 

8351 

8401 

8451 

8501 

8551 

8601 

8651 

8701 

8751 

8801 

8851 

8901 

8951 

9001 

9051 

9101 

9151 

9201 

9251 

9301 

9351 

9401 

9451 

9501 

9551 

9601 

9651 

9701 

9751 

9801 

9851 



AGTTTGAATC CTCATTAGTG GTTGCCAGCT GTACACACTT GGGCAGATCA 
TTTAACCTAG TCTGTAGGGC TCAATTTCCT CATCTCTAAA GTAGGGATTG 
TAATCATATC TACTTCATAG GGTTCTTGAT GTAAATATTA AATAACATAG 
AACATGGAAA GCATTTAGCA GCACCTAGTT CATAGCAGTG CTTGATAAAT 
GTTCGCTGTT GCTATTTGGG GGCACTATGC ATTTTCTGAA CATTTCTGAA 
CAATGTTTAC TAAATATATG TAGTACCCGT TTTCAAGTGT ATTTAGATGC 
TTCTCTGGGG ATGAAGAAAT ATAAATTAAA TATAGTACAG TATTCACAAC 
AGTTTTCTGT CCTTTTTGTC TAGTCAGGAG TTACAAAAAG TATAATGAAA 
TACTTTCATA TGGCTGGGGT GTTTATGAAA ATTTTTTACC TAAACT^AACA 
ATTGTCATAT TAGTTTACAA TATTCATGAG GGCAAAGGCC TTGTCTTCCT 
TATATTTCTC TGTATCTCTA CCACCTGGTA CGTGTGATAG ACAATAAATA 
CTTGTGTGTT TATTGTTTGT AAATGAATAA ATGAAAAAAT ATTCACATTG 
TTGAAAACCA CTACTCTGGA TAGTCAGTGG GTGCTTATCA CTGGCTTGAT 
TATGGCAACA TTAACAAAAA AGTGCAGTAT TTTAGAAACT AGGTTTCAAG 
ACTCTCAACC TTTCAGTGGC CTTGAACTAT CCAGAGAACA CTTTATGGGT 
TAAAATTGCT AAATGATAAC AGAGAAAAAT GGGAGCCAGA GTTGTCCACC 
TCTCCAGAGG ATGAGAGCAA ACAATCCTGC AGCAGATACC GTGTGATTGG 
TCACACGAGG AAAAATCTGG CAGCCTTAAG ATTACTTTGC AGCGGGGGAC 
TCCCACCATC ATGCTCAAGT GTGTAGATGG GCACACCAAA ACACACACAT 
GCAGGTGCCC TCCACTTTAC ACAAGAAGCA AATGTAAATG AATCTTGTTT 
TCAGTGATTT AGAGAAACAA TTTAAGTGAG CCATTACTCA TCTGCTTCTA 
AAAGCAAAAA CTCCTTCTCT GGTGGTAGTA TTTGCACTCT CATTTGTAAA 
TGTTGGAAGC TGAAAGTTTT GTATTTGAGT TTGCTTTAAG ATTCACACAT 
CTGTGTAAAT GGACCTTCTG TTGTTGGGGG GAGAATTTGG ATTTTCTTTA 
TAGATAGAGT TGGCAATTTT TTAGAGAGAA GCATTTACTG CTAAGTCATG 
AGAAATAATC ACTGGTGCAT AATTAGAGAG AGGAACAGGA AGAAGAAATG 
GTGAGCTGGA TGTAGGGTCA TGCCCCATTT AGTAACTGTT AGTTTCCCAC 
ATAGGAAATA CTTCTTTTTA GCTTCCAGAT CCCACTCCAA TCTGAGTGTG 
TGATGTTGGC AAGTGAGGCA GAGAGTGTGA CTCGGCTCAC CCTCTATTGG 
GACAAGAGTT CACAGTAAAT GTCATTCAAC AGTGACTTGG TCTGGGGGTA 
CAGGATATAT TAATATTGAG AAGATAAATA CACTAACTTT GTTTAGAGAA 
TTATCCCCCA AGCTTAGAAG TCCCAAAGAA AGCATGTTAT GTCACTTCCA 
GAAAAGTCTC AGGCTCCTCT GCTTGTGTGA CCTTATCAGG TCCTGAACTC 
AGCTTGTGTC TATAAGAGGG GACAGGTCCA GCTTGGCTGG CTAATTACTT 
TTACT^TTTTT CACTGCAGTT TATTCAGGAT GATAACATGG AGAAGCTTGA 
GGAAATTATT GAAAAATACC CTCGTGCCTT CCCTTTCTGG ATTGGGCCCT 
TTCAGGCATT TTTCTGTATC TATGACCCAG ACTATGCAAA GACACTTCTG 
AGCAGAACAG GTAAGAAGAG GGGGAAAGCT CTGGGACCTA TTCCTCCTAG 
AAGTGAAATG CATAAAACCC ATAGGCAAGA TTCCAAAGCA AAGATTGGTT 
TGGGGCCTTT AAGAGACACA GCAGCAAGTA TGGGGAGGTG ACAGGTTTCC 
TACCAATACT GAAGGGGATT CCCATATCCT CCCCAGTCCC TTGTCTTGTT 
CAGGTATGCA TGGGCACGTT' GAAGTCGGTA TAACTTAAAG CCTAGCTGGC 
ATTACCAGAC TTGCCAGGCA AGGCTTCCCT TGGCCTCTGT GGGTTTTATG 
ACTTCAGTGT CAGCAACACT TCCCACTCCT ACCCCTGGTC TCGAGCATAA 
GTCTCAAGAG GGTGGGAAAT CAGCAGTAAC TCTACCTCTG CTGGTTCAGT 
ATGAAAGCCT GAATGCTAGA TCATTAATTT ACCCATCAGA CCTCTTGATN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNKNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNTCTGCT 
TGACTCTGCA GATCCCAAGT CCCAGTACCT GCAGAAATTC TCACCTCCAC 
TTCTTGGTAT GTATGTGCAA ATGAGAGGTA TAACCCACTC TCATTCAAAG 
TCCCCTTTCC ATAGTAGAGC ATGCCAAAGA AACTGAAATC TGAATTCAAA 
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9901 
9951 
10001 
10051 
10101 
10151 
10201 
10251 
10301 
10351 
10401 
10451 
10501 
10551 
10601 
10651 
10701 
10751 
10801 
10851 
10901 
10951 
11001 
11051 
11101 
11151 
11201 
11251 
11301 
11351 
11401 
11451 
11501 
11551 
11601 
11651 
11701 
11751 
11801 
11851 
11901 
11951 
12001 
12051 
12101 
12151 
12201 
12251 
12301 
12351 
12401 
12451 
12501 
12551 
12601 
12651 
12701 
12751 
12801 
12851 
12901 
12951 
13001 
13051 
13101 
13151 



AGCACAAAGA GTGCAAGGTA GAGCTATACT GAACGTTATC TAGGGGAAAG 
ATTGAAGGGG AGCTCTAAGG TCAACACACC ACCACTTCCC AGAAAGCTTC 
TTCATCCGTT TCTCTCCCAC AAAGTCTTAT TCTCAAGGCA GCAGATACAT 
GAATCTGTCC CCTCTCTCTT TAAAACTACA GCCTTGGCCA GGCACAGTGA 
CTCATGCATG TAATCCCAGC ACTTTGGGAG GCCAAGGTGG GAGGATCACT 
TGAGGTCAAG ATTTCAAGAC CAGCTGGGCC AACATGGTGA AATCCCATCT 
CTACTAAAAA TACAAAAATT AGCCAGGCAT GGTAGCATGT AGGCCTGTAG 
TCCCACTACT TGGGAGGCTG AGACATGAGA ATCGCTTGAA CCTAGGAGGT 
GGAGGTTGCC GTGAGCTCAG ATTGTGCCAC TGCACTCCAG ACTAGGTGAC 
AGAGCAAAAC f CTGTCCGCA GCCCCCAACA ACAAAAAAAA AACTACCCAA 
ACTGCAGTCT CACCATCCCT ATTCTTGTTT TCTTTATCCT TCTCTCGTTT 
TCTTGGATGT TTTCCTTTCT TTTTGGAGTT CCTTTATTTC CACATGCGAG 
TCAGTAAAAT TTTGCTCTAG AGTTTGGCAA TATTCTGTCA GCAGATAAAC 
TAAGCTCTTT AATTACATAA TTGGTATTTA TGTTAAACAA GACATGAATG 
AAAGAAAAGA ATATAGGCTT GTATTAGGAA CCACTTAAAT TTGAATCTTG 
CCCCCTCCTG CATTGACTAG TTAAATATGA TCTTGGGGAA GTCATTTAAT 
CTCTCCCTAT CTCAGTTTCC TCATCTTTGA CAATAAGGAT GAGACTCACA 
TTGCTGGGCT GTTATGAGGA TTAAATGAAA TACATATTTT TAGCACTACA 
TGTAATGGCC ACCATTGTAT GAGTGACAGA TCATGCATCA TGAGCCTGGA 
ATGTTGTAAG CATTCMTGA ATGGTATCAA TTATGTATTA ATAAACTTTA 
AAGTCCTTTT AAAGCCAAAT CCTAATGACC AGTCTGGCAA TAGAAGATTG 
TGAAGCATTA GCCTTGGTAA GTATTTCCAC ATAGTATCAT TCATAGACCT 
GGGCTCAAGG AGGAAATATC AGGGGACAGA GTGGACACTC TTGTCTCTTT 
CCTTGTGAAT TTATGTTCAT CATATAGTTT ATGGATTGGT TTGGAGTGGA 
AAGGAATTCA CTTGCTCTGT TACTAGTGTG AGCTAGGGAG TAGGTTGGCT 
ACCTTATGTA TTCACTTTCA GTTAACCTCC ACAGCAACAC AGGGAAAAAG 
GTATTTAGTA TCATAGTTCA TTATTGAGAA AAGTAAACCT CAGGAAGATT 
GAGTCACTTA TTCAGTTACT ACATAGGTAG TAACTGGTGA TTTCAGGATT 
AGCGTGCTAA TCTTATAAGG CTTTGAAATT TATTAGACTT TGAAACTGTT 
TCTCACAATA TTAAATACAT CCATCCCAGA GGTAAGCTTC TAAATTCACC 
TTCATCTATT AAATTGCATT GCACATTAAT ACGAGTACTA CTTTGATACT 
CCACTGTTGC ATGACTGCCT GTGGGTCATG GTTACTCCAC GCTGCCTGTG 
TTCCTCATCT ATCCTTCATC TCATCTAATT AAATGGCATA AGGTTTTCTG 
CCTTTTATTT CTCAAGGAAA AGGACTAGCG GCTCTAGACG GACCCAAGTG 
GTTCCAGCAT CGTCGCCTAC TAACTCCTGG ATTCCATTTT AACATCCTGA 
7y\GCATACAT TGAGGTGATG GCTCATTCTG TGAAAATGAT GCTGGTAAGT 
AAAGGGGGAA AGTGCTCTGT GCATTGCGAA ATGCTCCCAG CAATGGACAG 
TATTAGGTAT GTGTTTTGTG GGCCATGAAA ATAAAAAATC AGTTTCTAAA 
AATTTAACCA ATGTACACGT ACTTATTGAA CAATAGGTGT CTGTAAAAAA 
TTTGTTATGT TCTTTGAGTG ATAATATTAA TAAAAAGATC TGGTCCTCTG 
TCTTAGATAT ATTTTGAGAT TTTATGGCAG CAAACCAAGT ACCAAATGGT 
GATAGTTAGA TAGTAAGTGC TGTAGATGTG TTTCATGGAG GGCGGGTCTG 
TACAAACCTA CCCCAAAGTC TGAGGAAACT GAGAGGCTGA AGAAAAAGGC 
TGACAGTTTC TTAAAAAG7VA ACATTCAATA GAGGCTTTCA AACAAAAACC 
ATNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNMNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNIJNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
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13201 
13251 
13301 
13351 
13401 
13451 
13501 
13551 
13601 
13651 
13701 
13751 
13801 
13851 
13901 
13951 
14001 
14051 
14101 
14151 
14201 
14251 
14301 
14351 
14401 
14451 
14501 
14551 
14601 
14651 
14701 
14751 
14801 
14851 
14901 
14951 
15001 
15051 
15101 
15151 
15201 
15251 
15301 
15351 
15401 
15451 
15501 
15551 
15601 
15651 
15701 
15751 
15801 
15851 
15901 
15951 
16001 
16051 
16101 
16151 
16201 
16251 
16301 
16351 
16401 
16451 



NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NUNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
NNNNNNNNNN NNNNNNNNNG GTCAGGCTTT GCTGGGGGCA GCTCCCTGCA 
ACAGCTCCTC TCCACACTTG CTCTGTTTCT CACTTTTGAA TCCAAACGTT 
TTTGAAAATG TTCTGAGTTT ATTTTAAAAT GTGGCTATGG TGGTTGAGAG 
CAGTGGCAGG GTACCTAGCA AGTTTGGAAT TGAAGTTGGA GGAAGCCCTG 
GGGTAAACCC CTTGTAATTA TGGGTCTTGT GTCAATGATT GCTTTAATGG 
AACTCTGGTC TGTTTGAAAG CAGAGTTATG GTAATAATTG AAAAGCCGCA 
GATCTTTAAC TCAGCCATTT ACCATATATG CAGTTTTCTC CATGCTCCTT 
CTCACTCCGC TGGGTGTATT TTTCCCTTCC TCGTGCCCTG TGTAAGCACA 
TGGCTTATTT ACTCATGTGA TCTTTGGTTC CTGCTGGGTC AGGGTTGTCT 
CCATTAGATC ATAAAAACAG GGCCAGGCAG GAGCCTTCAA ATGAAG6CAA 
TTTGGTCATG GTGGTGGTGA TGATGTTGGT CTTGACCTCC TGTGCCAGGA 
TAAGTGGGAG AAGATTTGCA GCACTCAGGA CACAAGCGTG GAGGTCTATG 
AGCACATCAA CTCGATGTCT CTGGATATAA TCATGAAATG CGCTTTCAGC 
AAGGAGACCA ACTGCCAGAC AAACAGGTCA GTGGTGGGAG AGCAAAAAAG 
ATATTTCTTC ACATTTTCTA AGTTGTTTAT TAACACATTA TCCCAACTTT 
CTCTTCTAGC ACCCATGATC CTTATGCAAA AGCCATATTT GAACTCAGCA 
AAATCATATT TCACCGCTTG TACAGTTTGT TGTATCACAG TGACATAATT 
TTCAAACTCA GCCCTCAGGG CTACCGCTTC CAGAAGTTAA GCCGAGTGTT 
GAATCAGTAC ACAGGTATTT GTTGGGTTTG GGTTGCCCAC GTCCATACGC 
TGCCATGATT GTACTGTGTC TGTCTAGAGG GATAAACCTT AATATGACAA 
GAGAAAGAAT CTTTGTTATT AATGGAGCTT TTATATAGAC ACTGCTCCAA 
AGAAATTTGA CTTGAGTCCT TTATAAGACT TTGCTTCAAC CATAGCAGTA 
TTATCAGAAT TTTTATATAT ATATATATAC ACTATTTTTA TTATGGACAA 
TTATTATTAA TACAAATATA AGTAGGCACT TAAGAGTTCC AGACATACAT 
GGAATATGGC TTTTTGCACA GCGATTGCAG TAATAATAAT GACAAGCTAA 
AAACATTCAT GCAACATAGG AATGGAGAGT GGAACAGAGT A/^ACATGGAC 
ATGCACCCGA AAGAATATTG ATTCAAAAAC AGTTTTAGCA AGCATAAACA 
CAAAAGTTGA AATAGATTAA GCTTTTTAAG CAATTC7UVCA TTACTTGTCA 
TGAATGCCAT AATGGAGAAT ACTTATCAAG CAGTGAATTA ATCCTTCATC 
AGCTTCACCA CTTACTAGCA GTTACTAGTA AGTTACTTAC TGCTTTGTTT 
CAGTGTCATC TATAAAATGG AGATTAAAAA AGAACCTATC TCATACATTT 
GTTGTTACGA TGAGTGGGTT AATATATATA AAGCATTTAG GACAGTGCCT 
GGCACTGAAT AGATGTTAAA TGTAAAGTAT AGTTATGTCA AATGTCTTTG 
CTTCCAGGAA TTTTGCAAGA CACACCAACA TATGCACACT TACACATACA ' 
TATATGCATA CATGCACATA GATATTATAA AGAGGACACT CAGAGAAGCA 
GGTTATAAAC AATTTAAGGC ATAAATGGGC ATTATAAATA GCAGCAGTTC 
CCAAGTCTTT CTGCATCATT GCACACACAG AAAATGTTAA TGTTTTTGTG 
CTTCATTGGA GTAAACAGGA ATGGATTTGG GGGAAGCTAT ACAGAACTTT 
GTAAAAAAAA ATCTTTACTT TTTAAATATT ATACAATTAT GATGAAAAAG 
CAAAATGCAA AGTGTTAGGG AAAATATTAA ATGTTAAATT TATTCAAAAC 
TTAAAACCTT TTCAATTTTT XTTTTTTTTT TTTTTTGAGA TGGAGTCTCT 
ATCACTCAGG CTGGAGCGCA GTGGTGTGAT CTCAGCTCAC TACAACCTCC 
ACCTCCCAGG TTCAGGCAAT TCTCCTACCT CAGCCTTCTG AGTAGCTGGG 
ATTACAGGCA CTGCCACCAC ACCTGGCTAA TTTTTTTAAA TTGTTTATTT 
TTATTTAGTC AAATATATCA ATATTTTATT TTATTGCATC TGGATTTTTA 
GTAATCACAA AAAGCCATTC TCTATTCCAG GGTTTCTCAA CCCTCAGCAC 
TAATGGCTTC TTAGATTAGA TAAGTCCTTG TTGTCAAGAT GTGTGCATTG 
TAGGATGTTT AGCTACATCC CTGACATCTA CCCACTCGAT GTAGTAGAGC 
TCTGATAGTT ATAGCAACCA TAAATAACTC CAGACATTAT TGAATGTTCC 
CAGGGCCCCC AGTTGAGAAC CACTGCCCTG TACCCAGGTT GTAGAGAAAA 
TTATTTATGT TTTCTTGTAG TACTTGTATA ATTTCATTAT TTTCATATTT 
AAATCAGAGA TCTAAACTCC ATTTAGAATT TATTCCTATA TATGGTGTGA 
GGTATTGATC TAATTTTTCC AAATGTTTAT CCAGTTGTCC CATCACCATT 
ATTTAAAAGT TTATCTTTTC AAGTGATTTG AGATAACCAT CACATTCTAA 
ACGGATACAT GTACTGGTAT CTGTTTTGGA TAAGAGTATA TTTGGATGTT 
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16501 CTCGTGTATT CCATTGATCT ATCTACCAAT GTACCAGAAT CACACTGTTT 

16551 TAATTAAGGA GATTTTGTGG CTTTTTTCAA CATTAATAGA CCTTATTTTT 

' 16601 AGAAAAGTTT TAGGTTTGCA GAAAAATTCA GCAGAAAGTA CAGAGAGTTC 

16651 TCATATTACC CATGTAACAA ACCTGTACAT GTACCCCTGT ATCTAAAATA 

16701 AAAGTTGAAA TTTTTTAAAT AGTAAATAAA TATTACCTCT GTTCCATATT 

167 51 TTTGTTTTGT TTTTTTTCTC TCAGCTCCTT CAATTATAAA TATATTGGCA 

16801 TTTCTTTGCC TGTCTTCTAT TTCATTCCAT TTTATTTAAT AACTTTTCCG 

16851 TGAAGATAAA ATATTAGACT GAGGAAGAAA AGAATAATTG GTCACTTGCA 

16901 TCTAAACTTG AAATCATCTT AATTTTATTG CCCACATACT GATGGAAACT 

16951 ATGTTTTTTA TTTGTGTTGT TTATCTTTGG AGCTTTAATC AAAAGTCCCT 

17001 TTGATGAGAA AATAAACCAT CTGTGAAAAT TAGATCTATT TAAACGTCTG 

17051 GAAATCAGGC AAGATTTGAA GCTATTCACT AACCATGGCT TGCTTTATAA 

17101 TTTATTTGAC TTTGCCATCA CTTTGGTAAT TGGAAACTAT TTTTCTACCC 

17151 AGATACAATA ATCCAGGAAA GAAAGAAATC CCTCCAGGCT GGGGTAAAGC 

17201 AGGATAACAC TCCGAAGAGG AAGTACCAGG ATTTTCTGGA TATTGTCCTT 

17251 TCTGCCAAGG TAAATCTTCT AAATTTCTAA GCCTGCTCAA GTGACCAGTT 

17301 AATTATGTAA GTAGGTGGGT AAGTGGGAAT GGGATGGGGA GACAAGAATA 

17351 AAACCGATTG ACTAAATTTA ACTGTACTTT GAATTGATGA GCAGCTTCAT 

17401 GCAATTTGAG ACAAAGAGAG AATTCTGCAA CTGfGTCGCT AGAGGAGGGT 

17451 TAGTAAAGAC TAAACGAACG ATTTGACAAG ATTTGAGGAT TGTCATATGG 

17501 ATACATGGAT TTTAGGGCAT CATGAAAAAA TGGTCACATG GATAAACGTA 

17551 T^AAATTATGA TGATAAGGTC CTGGGAAATC TGGGAGTTTG AAGAGAATTT 

17 601 CTAGGGCCTG TTGATCGAGG GCCCTTTGTG CAAGGCCTGC TTTTCTTATC 
17651 TAACCTTGGT TCTCCTTTAT GCTTTGGGCA GAATATGGTT TATACCACAT 
17701 ATTTGTTGAA CTGAATTAAA ATTTAAACCC CTATTTAAAG CTCTGATTTT 
17751 TCCCCTCAAA TCATTATTGT GGTTGTATCT CCAAACATTT ATAAACTGGC 
17801 ATTTTATTTA AAATATTTGT ATTGTACTTT CTAGGATGAA AGTGGTAGCA 
17851 GCTTCTCAGA TATTGATGTA CACTCTGAAG TGAGCACATT CCTGTTGGCA. 
17901 GGACATGACA CCTTGGCAGC AAGCATCTCC TGGATCCTTT ACTGCCTGGC 
17951 TCTGAACCCT GAGCATCAAG AGAGATGCCG GGAGGAGGTC AGGGGCATCC 
18001 TGGGGGATGG GTCTTCTATC ACTTGGTAAG ATCTGCACCC CTAAATTTTC 
18051 CTGCTAGTTT TCCCCCTGAG ATTTTGCTTT ATTTTTTGCG CTGGTACCTT 
18101 AGTGACCCTA GTGCCTCAGG ATATGTGTAG GTGAAACAGA AGAAGTAGGC 
18151 TACTTTTCTG TTCTTTCTAA AGAGAGCTCC AAATTATTCT CTTGTCTTTC 
18201 AGGAAAAAAA AAAAAGTTTA TTTATCCATA AATTGTCTGT CATTGGTTTT 
18251 CTAATCAATG GTGTGTGAAA TGTCTTATTT CTTTATTTCA CCTTGGCTCT 
18301 GATGCATTGG AAATGAGGAC TTGATCCCTG GGCTGGCACT TAGAACTTAA 
18351 ACAATAGGGT CCAAGTGGAG CTCCTCTTCT GAGAGAGCTG AATGATTAGC 
184 01 TGCATTATTT AAGGCTCATT TTAGACATCT CCCAGCCGCT TGTCACCAAT 
184 51 TTTATTCCTC AGGATTGATT TTAGACTTCA GACATAATAT TCGATGATAT 
18501 ATACTATAGT TAAGTTTAGC AAATATGGAC TGAGGACATT TTAAATACTG 
18551 AGACTTTTTT TATGACTACA ATTTATTGTG GGCCCTGTCT TCGGTGAGCT 

18 601 AATGGTCTAA TACAGGAGAC AGGAGACAGA CCTCCAAATT GCAGTGTAGC 
18 651 ATAATGAGGG CAATGATAGA GATATGTGCT GGCTAACACA AAGACATAGA 
18701 AGACAGGTAC CTACCCTGGC ATGGGAGCTC AAGGAGACTT CCTTGACATT 
18751 TACGCTGACT GCAGGATAAG TAGGAGTTAG CCAGGTGGAA ACTGTCATCT 
18801 CTATCTTGCT AGACTTTAAG CATATACTGC TGTTAATAAA GCCCAGGTTA 
18851 TGCTGTTTGC AAAGATAAAA TGTGTTCCTG ACATAATACT GGTCAAAGGG 
18901 ACAGAAAGAC AGAAATGCTA AGGACAATTC AGCAGCAGAC CAGATAAAAA 
18951 ACACCATATT TCATATGCAA AAGTCAACTC AATTGAAACA TTTGTAAAAC 
19001 CAAATTTGAC ATTATAAAAG TATATCAGAG ATCTCATTTT ATAAGGAAAT 
19051 AGAAGCCCTT TCCTACCATA AACTAAAGAT TTAATCTATA TAGCACAAAA 
19101 TACAATGTTG AGTAATCATT TTTAATTTAT TTTTTAACTG ACAAAAATTG 
19151 TGCATATACA TGTTATATAT ATATGTATGT GTGTATATAT ATATGATGTA 
19201 CAACATGATA TTTTGATATA TGTATACACT GTGGAATGAC TAAATCTATC 
19251 AATGGACATG TTCATTAACT CATACTTATC ATTTTTTTGT GGTAAGGACA 
19301 TTTAAAATCT ACCCTCTTAG CAATTTTCAA GTATACAAAT TGTTAGTAAC 
19351 TCCAATCACA TATTGTAC7UV TGCATCTCCT AAACTTATGC CTCCTGTCTG 
194 01 ACTGAAATTT TGTATCCTTT GACTTVACATC CCTGTAATCC CCCATTCTCC 
19451 CACAGCCCCT GGTAACCACT GTTCTACTCT CTGCTTCTTT GAGTTTAATG 
19501 TTTTAGATTT CCACATGTGA GATCATGTGG AATTTGTCTT TCTGTGCCTG 
19551 GCTTATTTCA CTTAGCATAA TGTCATCCAA ATTCATCTCT GTTGTCATAA 
19601 ATGACAAGAT ATTTGTCTTT TCTATGGCTA ATTGTTAGTC CATTGTTTAT 
19651 ATATATACCA TGTTTTCTTT ATCCATTTAT CCAGTGATGG ACACTTAAGT 
19701 TGATTTCTAT ATCTGGGCTA TTGTGAATAA TGCTGCAATG AACATGGGAA 
19751 TGTAGATGTC TCTTCAATGC ACTGATTTCA TTTCGTTTGG TTGTATATCC 
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AGAAGTGGAA 
GAAACTCCGT 
AAAGTGTATA 
TTTGGTAATA 
TAATTTACGT 
CTGCTGGCCA 
CATTTTTAAA 
CTCATATATT 
TGTTCTCCCA 
GTTGTGCAGA 
TTCTGTTGAC 
TAATATCAAG 
CAGGTCATAT 
GGAGTGAGAT 
TTCTCAACAC 
TGGAACCTTT 
CTCTTTATCC 
TGCTGTTTTG 
GATGCACTCC 
■ GTTTTTTTAT 
GTGAATAATG 
AGTATGGATA 
ATTTTGCAAT 
AATTTAATTG 
TATGAGTAAG 
CCTAAGCAGT 
CCTCCCACCA 
CCTTTGCATC 
GTTTGGTTCT 
TCCATCCAGA 
GTAGTATTCC 
TGAT6GGCAT 
CTACAAACAT 
TCTGGGTAGA 
TTTTAGTTCT 
GTTTACATTC 
CACCATCATC 
AGTAAGGTGG 
GTGATGTTGA 
TTTGAGAATT 
TGTTTTTTTC 
TTTGTTGGAT 
TGTTTACTCT 
ATTAAGTCCC 
TTTGTTTTGG 
CTTTGCCTAA 
TATGGTTCAG 
TGTATAAGGT 
GCCAATTATC 
ATATATTTAG 
ATCCTCCTGA 
TCTTTTTACA 
CTACCTTTGC 
CCCTTCGCAT 
AGGGGCATAT 
GATTAGAGAA 
GACTTACTAC 
TTTGTTCCTT 
TCTCTAGTGG 
ATAGGTTTTT 
AAAAGGCTAT 
CTATACACTT 
ATAATTTACC 
AAATTATTGT 
ACTAGAGATA 
TAAATTGACT 



TTGCTGCATC 
ACAATTTTCC 
AGGGTTCTGT 
ACCATTCTAA 
TTCCCTGATG 
TTCATGTCTT 
TCTAGTTATT 
TTGAATATTA 
TCCTTTAAGT 
AGCTTTTTAG 
TATACTTCCA 
AAGCTTTTCT 
GTTTAAATCT 
AAAGGTCCAC 
CATTTATTGA 
GTAGATCAGT 
TGTTTTATTA 
GTGACTAGAG 
AGCTTTGCTC 
TCCATACGAA 
CCATTGGAAT 
TTTTAACAGT 
TTGTGTTTTC 
TTTTATTTCC 
TTCTTTAGTG 
ATACACTGTA 
TTTCCCCCAA 
CTCATAGCTT 
CCATTTCTGA 
TTGCTGCGAA 
ATAGTATATA 
TTGGACTGGT 
GCAGGTGCAA 
TACCCTGTAG 
TTAAGGAATC 
CCACCAACAG 
TATTATTATT 
TATTGCACTG 
GCATTTTTTC 
GTCTATTCAT 
TTGCTAATTT 
GTGTAGGTTG 
GCTGATTATT 
ACCTATTTAT 
CTTGGTTTTG 
GCCAATATCT 
GTCTTAGATT 
GAGAGATGAG 
CCASTACAAT 
GTGTTCCTAT 
TGGATTGACC 
GTTTTTGTCT 
TCTCTTTTGG 
TCACTCTATG 
GCTTGGGTCT 
TTTAATTCAT 
TGCCATTTTG 
TCATCCTCTC 
TGTACTTTGA 
GCTTTGTGGT 
TTTAAACTGA 
TTACTCTACC 
TAGTTTTGGA 
AGCAACAGTC 
GAATTAATTA 
ATGTATTTAC 



ATATGGTAGT 
ATATGGCTGT 
TTTCTCCACA 
TGAGCATGAG 
ATTAGTGATG 
CTTTGTAGGA 
TGTTTTCTTG 
ACCCCTTATC 
TGTCTCTTCA 
TTTGCTGCAA 
GAGTTGTATC 
CTATGTTTTT 
TTAATCCATT 
TTTTATTCTT 
AGATACTGCC 
TGACAATA7\A 
GTTTATATGT 
CTCTGTAGTC 
TTTTTGCTCA 
TTTTAGGGCT 
TTTGATGGAG 
ATTAATGCTT 
TTCAATTTCT 
ATAGGGTTTG 
GTGATTTGTG 
CCCAATTTGT 
GTCCCCAAAG 
AGCTCCCACT 
GTTACTTCAT 
TGCCTTTATT 
CATCCCACAA 
TCCATGTCTT 
GTGTCTTTTT 
TGGGATTGCT 
TCCACACTGT 
TGTAGAAGTG 
TGATTTTTTG 
TGGTTTTGAT 
ATATATTTGT 
GTCCTTTGTC 
GAGTTCCCTG 
TGAAGATTTT 
TCTTTTGCTG 
CTTTTCGTTG 
CATCTGCTTT 
AGAAGGGTTT 
TAAGTCCTTG 
GATCCAGTTT 
TTGTTGAATA 
TTTGGGTACA 
CCTTTCTCAT 
TAAAGCCTAA 
TTTCTATTTG 
TGTGTTCTTA 
TGTTTTATTC 
TTGTATTCAA 
TTAATTGTTT 
TTACTCTTTT 
TTTTTACTTT 
TACCATGAGG 
TAACAGCTTA 
AACTGCCCTC 
GATGTGTCCC 
ATTTTTAATA 
ACATACCACC 
CTTTATCAGT 



TCTATTTTTA 
ACTAATTTAC 
TCCTCACCAA 
GTGATGTCTC 
TTGAGCATTG 
ATGTTATTTT 
CTTTTGAATT 
AGATGTATCA 
CTATGTTGAT 
AACCATTTAT 
CAAAAAATCA 
TTCTAGTAGT 
TTTAGTTGAT 
CTACTAGTGC 
CTTTCACCAC 
TGTGTGGGTG 
CTCTTTTTTT 
AATTTCA6AT 
AAATTGCTTT 

ATTGCATTGA 
CCAATTAATG 
TTCACCAGTG 
GGTAACAGGT 
AGATTTTGAT 
AGTCTTGTAT 
TCCATTGTAT 
TATGAGTGAG 
TTAGAATATT 
TTGTTCCTTT 
TTTCTTTATC 
TACAATTGCG 
CATATAATGA 
GGATCAAATG 
TTTCCATAGT 
TTCCCTGTTC 
ATTATGGCCA 
TTGCATTTCC 
TGGCCATTTG 
CATTTTTTGA 
TAGATTCTGG 
CTCCCACTCT 
TGCAGAAACT 
TTGTTGTTTT 
TGGGTTCTTG 
TTCTGATGTT 
ATCCATCTTG 
CATGCTTCTA 
GGGTTAATAT 
TATTTATTTA 
TATATAATGG 
TTTGTCTGAT 
CATGGAATAT 
AAGATGAAAT 
ATTCATTCAG 
GGTAATTATT 
TCTTGATGTT 
CCTTTGTGAT 
TTATCTTTTG 
GTTACATAAA 
ACTTTCAACA 
CATTTTATGT 
CTTATTGTGT 
GTTTTGGCTT 
ACTACATTAT 
GAGATTTTTG 



ATTTTTTGAG 
ATTCCAACCA 
CATTTGTCTT 
ATTATGGTTT 
TTTTAAATAC 
AGGTTTTTCT 
GTGTGAGTTC 
TTTGCAGACA 
TGTTTCCTTT 
CTATTTTTTC 
TTGCCAAGAA 
TTTATAGTTT 
TTTTGTATAT 
ATATCCAGTT 
TGTATGTTAC 
TATTTCTGGA 
AGAAGCTCTA 
CAGGTAGTAT 
GGCTATTTGA 
TTCGATTACT 
ATCTTTGGGT 
AACACAGGGT 
TTTTTTTCTT 
GGTGTTTGGT 
GCACCCATCA 
CCCTCACCTC 
CATTCTTATG 
AACATATAAT 
GGTCTCCAAT 
TCATGGCTGA 
CATTCTTGAT 
AATTGTGCTG 
CTTCTCTTCC 
GTAGTTCTAC 
GGTTGTACTA 
ACTGTATCCA 
TTCTTGCAGG 
CTGATCATTA 
TACATCTTCT 
TGGGATTATT 
ATATTAGACC 
TTGGGTTGTC 
TTTTAGTTTA 
TTGGGGTTGT 
GTCATGAAGT 
CTAGAATTTT 
AGTTGATTTT 
CATGTGGCTT 
TTAAAGCTTT 
CAACTATCAT 
TCTTCTTGTC 
AAAAGTTCAG 
TTTTTTCCAA 
GAGATGCTGT 
CCACCCTTTT 
GACAGACAAG 
TTATAGATCT 
TAGGTGCTTT 
TTGCTCTACT 
GCATAGTTAT 
CTTAAAAAAA 
CTTTGATGTC 
ATCCCTTAAC 
TTAACTTTAT 
TAGGGTATTC 
TTTTCAATTT 
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TCATGTTGTT AATTAGTATT CTTTCATTTC AACTTGGAGA ATTCACATTA 
GCATTTTTTG TAAGATGGGT CTAGTAGTGG TGAACACCCT CAACTTTTGT 
TTATCTGGAG ATGTCTTTAC CTCTGCTTCA TTTTGAAATA TAACTTTTGT 
TCCATGATTG AAATGGACAA AATTGTTTTT TTAATTATGC AAAGTGCCAG 
GGTAAGCAGA ATTACTCTTT tTTTTTTTTT CTGAGACCGA GTTTCACTCT 
TGTTGCCCAG GCTGGAGTGC AGTGGCGCAA TCTCTCAGCT TACCGCAACC 
TCTGCCTCCC AGGTTCAAGC GATTCTTCTG CCTCAGCCTT CCTGAGTAGC 
TGGGATTACA GGCATGCACC ACCATGCTCG GCTAATTTTG CATTTTTAGT 
AGAGACGGGG TTTCTCCATG TTGGTCAGGC TGGTCTTGAA CACCCGACCT 
CAGATGATCC GCCCACCTAG GCCTCCCAAA GTGCTGGGAT TGCAGGTGTG 
AGCCACTGCG CCTGGCCAGA ATTACTCTTA TTTATCCTGA GCTTGAGGAA 
GAAAGAATTC AAAATTAAAA TTTCACATTA CCTAATGGCC AAAGCCTGCA 
TTCAAAATAA GTT^TCAGAA AAACATATAA AAACACAATA AGATAAACAG 
ACTAAATATA TGCAGTCATT TTATGGAACC AATCT6ACTA 6ATTGGATGC 
AGACTAGGTA GGATGCAAAT TTAAAAAAAA CTTTATTCTT CTTCCACTTA 
TAAACTTTAA ACCTGCTTTG TGGAGCAAGT TCTTTTTATC TCTGGGGAAA 
GATCCTGAGT AAGTCTCATA GAGTTCTCAT TCATTTAAAT CACAAGAACA 
ATCTTAGGTC AGTAATTAAA CTATCTGGCC CAGTGTAATA CTGAAACTTT 
CAAATACTTA TCCACTTGAG CTCTTCTTTC CATCCCAGCT TGGTACTTCT 
TTGGTCCTAG AAGCCAGCAG TGGTTTATCA TCGACTTATT CTTACTGACT 
AGCTCCCCAA TACCCAGTAG CTGCTGTTTC TGGCCCCTCC AGGAATGGTT 
TTAGGAGGAA AGG6GATAAG GAGTAAAGGG CTGGTACTAT TGTGATCATG 
CCAAAGGGCT TGGTGGATAT TCCATGCTTC CCTTTCTCTC AAGAGGAAAC 
TCCCTTTCTT GGAGACTCTC TCACTAGAAC TTTCCAGAGG TGATTCAGGG 
GACAAGAGAA TAATTGTCCT TAGGCAGACT CTTTTTCAAG CTGGTCCCAG 
AGCTTTCCCT CTTGCCAGTT AATTGGTTTA AGGACACAGT TGCACATCCT 
TGCCTTGCCT CTGCTGCTGT CCTCTGCCTT TCTGTCTGTT CTGAGTTATA 
GCCTTTCACA TCAGTCCTGT ACTCCCCAAA CTCCAAGGAG CACAAGTCAG 
ATCATCTAAG TGATCCTCTT GAAGCCTCTT GTTTAAGATG GGGGAAGCAC 
CCTTCCTTTT CCATGGCACT CTGGCATTCC AACAACACTT TAAATAATTT 
TTTCTCTCAA AATTCTTAAG CCTCTCCTCT TTAATCCTTC GCCATTTTTA 
TGTATTATTA CTTTATATGA TGAGCTAAGA GTTACAAAAC TGGTTTTTAG 
AAATCTCCTT AGCAAATGTT TTACTGCTAG TTTAGCAGCT CACTTTATAA 
TAAGGATATA TGATATATTT CTTTGGTTCC TCTGCCTCTG GGACCTCAGC 
TCATCCTGAG GCAGAGAGTC CCATTTTAAC ATTCTGTTAC ATAAACCAGT 
GGCAAAATGG CTTTAACCTG AGGGTAATAA TTACCAGGAA CAAACAGAAA 
ACAGAAAAAA AGT7V7U\CTGG TTATGATATC TGAGTCCCTT CCCTCCCTCA 
TCCTCACAGG GACCAGCTGG GTGAGATGTC GTACACCACA ATGTGCATCA 
AGGAGACGTG CCGATTGATT CCTGCAGTCC CGTCCATTTC CAGAGATCTC 
AGCAAGCCAC TTACCTTCCC AGATGGATGC ACATTGCCTG CAGGTCTTTA 
CATTCTTTTC CTAAGCAGTT CTTAGAGGCT ATGGGATCCT GGAGACCACA 
GTGACAAAGA TTAGTGAGTC TCTTAGCACT TGGAGAAGTC AAAAGATAAT 
GCTAACATGT GACTTAGGTT TTATCACCTA TGAGGAGCTC AGAGGATAAT 
GCTTTGGTCA GACATGAATT TCAATGACTT TCCCAAAGGC ACATAGCCAG 
TTGCAGCAM GCTAAGCCCA GAATCCATGT CTCTGGAATC CCAGCCCAGG 
GTCTCTTCCA TTGTGGGACA TCATTTGTAA GATAATCTTT GTTTGGCTGA 
GTTTGAGACC GAGCTGAAAC TTCATGGAAA ATAGCACCAG CATCTTTATC 
TGAAAGACCA AGGGGGATCT TTGGCCTCAT CATCATAATA TCACCCTTAT 
AAATATACAA CATTTAATAG TTAATATAGA GCCTTCAGAC CCATTATCTC 
ATTTTTCCCC TTGGAATCCA ATGTTAACAG ATGCTTATAC AATGATTTAC 
AGTTCACTGA ACACTTTTAA GTACTTTCAA TGTGGCCCAA AATCCAGAGG 
CAGCCCCAAT GTGTAGATGA CATTAACTGA TGTGAGCAGA GCTAGAACTT 
GTGCGGAGAC CCTGAGTCTG GAGCCTAGAG TTCTTCGGAA CAACACAGGT 
TTCTGAGCAG GGCTTATAGG AAGCAGAGGG GTCATGTGAG ACATATTATC 
TGATTCAATG TTCTATTAAT TCATGTCTTA GGAAGCAAGC CAACAGGATT 
GCTTCTGGCA AACACCTACA GCCTGTTACT GTAACTTTGC TGACAGACCC 
AGAATTAATT TCTGGAAGCT AGAATTATTT CTGGAAACCA AATAACCCTC 
ACATTCTCTC TCCTTTGTTT TGTACTCTGT TTCTCCCCAA ACCACATGGA 
TATTTGCCAA- AATTCTCCAC TTTCCATATG TGAATAGCAC CAATGGAAAT 
TTGTCATGGG ATCTGCATGA CAGAATCACA GTTCTGTGTG TGTGTGTGTG 
CGTTTTCCTC TCAAGACAGA GTCTTGCTAT GTAGCCCAGG CTGGAGTACA 
GTGGCGTAAT CTCGGCTCAC TGCAACCTCT GCCTCCCAGG TTTAAGCAGT 
TCTCCTGCCT CAGCCTCCCG AGTAGCTGGG ATTACAGGTG CACACCACGC 
CTGGCAAATT TTTGTATTTT TATTAGAGAT GGGGTTTCAC CATGTTGGCC 
AGGCTAGTCT CAAGCTCCTG ATCTCGAGAC CAGCCCTCCT CAGCCTCCCA 
AAGCGCTGGG ACTACAGCCA TGAGCCACTG CACCCAGCCA GTTCTGTGCT 




wo 02/34922 



PCTAJSO 1/42528 



13/23 



26401 

26451 

26501 

26551 

26601 

2 6651 

2 6701 

26751 

26801 

26851 

26901 

26951 

27001 

27051 

27101 

27151 

27201 

27251 

27301 

27351 

27401 

27451 

27501 

27551 

27601 

27651 

27701 

27751 

27801 

27851 

27901 

27951 

28001 

28051 

28101 

28151 

28201 

28251 

28301 

28351 

28401 

28451 

28501 

28551 

28601 

28651 

28701 

28751 

28801 

28851 

28901 

28951 

29001 

29051 

29101 

29151 

29201 

29251 

29301 

29351 

29401 

29451 

29501 

29551 

29601 

29651 



TTTATACCTA AATTGTCTCC AGGAGTGCTT AATAGTCCAT TAATAGGTAT 
TTAGGCCAGG CACAGTGGCT GACGCATATA ATCCCAATAT TTTGTGACAC 
CAAGGTGGGA AGACTGCTTG AAGTTAGGAG TCTGAGACTA GCCTGGGCAA 
CATAGGGAGA CCCTGTCTTT ACT^^AAAAAA AAAAGAGAGA GATAGCCAGG 
CATGGTGTTG CATGCTTGTA TTCCTGCCTA CTTGGGGGAC TGAGGCAGGA 
GGATCACTTG AGCTCAGAAG TTCAAGGTTA CCGTGAGCAA TGTTCACGCC 
ACTGCTCTCC AGCCTGATTG ACAGGCCAGA CCCTGACTCT AAACAAAAAC 
AAAAAACAAA TATTTAAGTA ATTTCCAAAC ATAGCAGAAA ATATAAGCAT 
GGTTTATCAC TTTGATATGA CACCAACAGC TACTTAAGAT AGAGTCATGA 
ATTCAGTAAA TTGTTGTGTG GAAAGCTAAG GTGCCAACCC AAGCCGCATC 
TTCTTAGGTG CTCCTCACTG GTGTCATCAG CTACAGCAGG CAGAGCATTG 
CCAGGAGCTA GCTCTTCCCT TCAAGAACAA AAGTCTTGTT TAAGAGCACA 
GTAGCCCACA ACTTGCTCTT TCTCCTGCAG TCTCTTTTAT TTCCCTCCTT 
TCTTAGGGAT CACCGTGGTT CTTAGTATTT GGGGTCTTCA CCACAACCCT 
GCTGTCTGGA AAAACCCAAA GGTATGATTC TCTCTTGTAC ATAAATACTT 
CCAAGAACTA ATGCTGTGCA AGTCACTTTT TGGTAGCTAA GCACAGAAGT 
GGCTATATAA TTAAGGGAAA TGACACAAAT TAAACAAAAA TAAACATAAA 
AGCCAAAAGA AATGTAAAAC TATTCTATGT TCTTGAAACA CTCTTGACGT 
GTATCAGTGA TTTCTTTCAT GTAAGCCACT AAGGTTTAAG ATCTATTACT 
TGTAACAGGA AGCTGGAGTA TATGTCTCTG TAATAATTGG CCACATCATC 
ATTTTGACTT GATTTCTAAG TGGATGCACA TCCATTTCTA AGTGGATGTA 
TCTCCATAGT GAAAATAATA CCACTTGCCA TAGTATTTTT GTTTGCCTGG 
GTATCAGACA AATCAGCTGT GAAGCTGCAA GGTCTGCAGG TCTGAAGGTA 
CACTGCCCAG TGTAGTAGCC ACGGGCCACA TACGGCTACT GAGCACATGA 
CATGTGGCCA GTTGGAATTG AGTTGTGCTG TAAGTTTAAA ATACGTGCTG 
GATTTTGAAG ACATAGTACC CTAAAAAAAT GTGAAACATT TCCTTTTAGT 
AATTATTTAT ATTGATTACA GGTTGGAATG GTAATTTTTG GTTAAATAAA 
CTCTATTAAG ATTAACTTCA CCTTTTAAAA ATGTGACCAC CAGAACATTT 
TAAATTACAC ATGTAGATCA CATTATATTT CTATTGATCG GTGCTAGGTG 
GTAGGTGAAG AAATGTGTTC ATGTTGTTTG GGGGATGGTG TTGGGGTTGT 
CCTCTCATTT CAGGTCTTTG ACCCCTTGAG GTTCTCTCAG GAGAATTCTG 
ATCAGAGACA CCCCTATGCC TACTTACCAT TCTCAGCTGG ATCAAGGTGA 
GAACAATTTG AAGTTGCTGA AAGTACCCAA AGATGTTTAC TTGAGAGTAG 
TTTATTCCTT TCAGCTCCTC AGCTCTATAC ATTCTTCCAG GGAACCGTAG 
ATCTTGGTGC CTATTTGAGC CCCAAAGGAT CAGTTAGTTT TACAAAGGAC 
AATCGTATTC TCTGTCACAT CCTTTTTGGC CATGCCTCAA AAGCAGTCCC 
ACAATGTAAG CTACTGCTCA TAGGCTCAAT GCAGTCCACC TTCAAAGCAA 
GAGAAATAAT TTCATGAGTA ACTCCAACTG CCGCCTTGTT ATAGGGAAGG 
CATCATGTTG GAGCCTCCCA GCTCAAATTC TCACAGTGAA CAATTTAAGT 
CTAAAGTTCA AAAGTTTCAA TGGCATTTGG TGGAAAAAAT ATCACTTTAC 
TGTGTACTTC AGACTTCTTG TACTAGTATT TTACTATAGT CAGAAGAAAC 
ATCATTTTTT CAAGTATCAC TTTCTTTCCC TCTTGTCTTC AGGAACTGCA 
TTGGGCAGGA GTTTGCCATG ATTGAGTTAA AGGTAACCAT TGCCTTGATT 
CTGCTCCACT TCAGAGTGAC TCCAGACCCC ACCAGGCCTC TTACTTTCCC 
CAACCATTTT ATCCTCAAGC CCAAGAATGG GATGTATTTG CACCTGAAGA 
AACTCTCTGA ATGTTAGATC TCAGGGTACA ATGATTAAAC GTACTTTGTT 
TTTCGAAGTT AAATTTACAG CTAATGATCC AAGCAGATAG AAAGGGATCA 
ATGTATGGTG GGAGGATTGG AGGTTGGTGG GATAGGGGTC TCTGTGAAGA 
GATCCAAAAT CATTTCTAGG TACACAGTGT GTCAGCTAGA TCTGTTTCTA 
TATAACTTTG GGAGATTTTC AGATCTTTTC TGTTAAACTT TCACTACTAT 
TAATGCTGTA TACACCAATA GACTTTCATA TATTTTCTGT TGTTTTTAAA 
ATAGTTTTCA GAATTATGCA AGTAATAAGT GCATGTATGC TCACTGTCAA 
AAATTCCCAA CACTAGAAAA TCATGTAGAA TAAAAATTTT AAATCTCACT 
TCACTTAGCC GACATTCCAT GCCCTGACCA ATCCTACTGC TTTTCCTAAA 
AACAGAATAA TTTGGTGTGC ATTCTTTCAG ACTTTTTCCT ATACATTTTA 
TATGTAGAAA TGTAGCAATG TATTTGTATA GATGTGATCA TTCCTATATT 
GTTATTGATT TTTTTCACTT AATAAAAATT CACCTTATTC CTTATCATTG 
CTTTATGGTA TTCTGTAATA TGAATGTACT ATAATTTATT TAACTATTTT 
CCTTATTGGG CATTTAAGTT ATTTCTAGTT TTATU^AACAT GCTTGTCAAT 
GGCAACAAAA GCCAAAATTG ACAAATGGGA TCTAATTAAA CTAAAGAGCT 
TCTGCACAGC AAAACAAACT ACCATCACAC TGAATGGGCA GCCTACAGAA 
TGGGAGAAAA TTTTTGCAAC CTACTCATCT GACAAAGGCC TAATATCCAG 
AATCTACAAT GAACTCAAAC AAATGTACAA GAAAAAAACA ACCCCATCAA 
AAAGTGGGTG AAGGATATGA ACAGACACTT CTCAAAAGAA GACATTTACG 
CAGCCAAAAG ACACATGAAA AAATGCCTAT CGTCACTGGC CATCAGAGAA 
ATGCAAATCA AAACCACAAT GAGATACCAT CTCACACCAG TTAGAATGGC 




wo 02/34922 



14/23 



PCT/USOl/42528 



297 01 AATCATTAAA AAGTCAGGAA ACAACAGGTG CTGGAGAGGA TGTGGAGAAA 

297 51 TAGGAAGACT TTTACACTGT TGGTGGCAGG AGAATCACTT GAACCCGGGA 

29801 GGGGGAGGTT GCAGTGAGCC GAGGTGGCGC CACTGCACTC CAGCCTGGGC 

29851 GACAGAACGA GTACTCCATC TCAAAAAAAA AAAAAAAGGA CACCAAACTT 

29901 CTCAATCTTA ATGTTGTCAT CTATGTGGTA TCTTCCATAA TCTCTCTCAG 

29951 ACAGAGTCAT CTTTTGCTGA TATGATCTTA CAGTATTTTT TGTTTATACC 

30001 ATTATAATCT CATTAATTGC AGCAACACAA ATGACAAAAG ACAACTGATT 

30051 TCTCCCCTTG GATGACCTAA TTTGCTTTCA CTCTTCCATC ATCACTTATA 

30101 ACATGATGAT TCTCAAATTC ATCTACCTAA AATCTATATA TAAAAAAATC 

30151 CCTCCCTTGA ATTCCAGATC CTTGGAGACA AACACCCACG TCTAAAACCA 

30201 AATTTGTTTA ACACTGGACC AGTCGTCCTG TGTGACTTTC CATTTTGTCA 

30251 CTATTTTGTC AGCTGGTATA CCAATATCCA CCCAGTTAAA CAATATTTCC 

30301 TTGTTTTTTT CTGGTACAAA CCCAAATAAA TTACAAACAT CAATAAAAGT 

30351 AAAATTCTAA AATAACTCAC TTTCTCTATA TATCTCCTTC TTGCTGGAAA 

30401 AATGGGTTAG GTTAGTTCTT TAAAAGCATG CATGATAAAT TGTACTGAAT 

30451 ACAATATTCA GGTCTGGACA TACTAGGTAT AATTTTCTGT GTCTCTGGGG 

30501 TCTTACCTAT TTGGGGTCAA AATAAACAAG TTTATTAAGC TTATTAATAT 

30551 TCAATTTCAT TATCTTCTTT AACAATTATG TTCCCTGGTA CTTTCATTGC 

30601 CAATAATTTA TTTGTCAGGT TGCCAGGTGC TTCTAAACTT CTGTGTATTT 

30651 TTTCATATCC AATTTTACTT TAAATATTTT TAGAAAAGAG GTCTGTTAAA 

30701 TTTCCTAATA ATTATTATAT TATTGTTTTT TCACTGACAT TTTGTGAATT 

30751 GAAAACCCTT AAAAATATGA AATCATTTTT TCGAAATATG TGCCACAGAC 

30801 AATTTTGTTA AATAAGAAGA CAGAAACAGG GCATTATCAA GAGATAAATA 

30851 TTCAATATAC CTTATATTTC TGTCACACAT TTTTATACCA ACTGTGCCAA 

30901 AAATTGTATA TCATATAAAT GATAACAAGT TCACAAAGGC ATTCCTTTAT 

30951 CCCTTAACTC TCAAATTAGA AACTTTCATA GGTAGGAAGT AGGGGAAGCA 

31001 TATATTCCCT TTGAAAGGTG CAAGAAAATG TCATTGGCAT TCACCATGGT 

31051 ACTCTTCAAG CTTAAAAAAA ATGGACTGCA AAACATTTAC AAACATAGCA 

31101 TATTTATTGG GTACCTTTAT GTTTACATAA ATATTGAAGA TATCTCACAT 

31151 ACCTCTTTCA ATCAGATTAT CTCACTGACA TTTATTGACC ACTTTCTATG 

31201 GGGAAAAC 
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2 67 


T 


c 


284 


G 


A 


1269 


T 


c 


2487 


T 


C G 


44 8 6 


G 


A 


4522 

^ ^ ^ 


G 


A 


4522 


c 


A 


5075 


T 


G C 


54 50 


T 


c 


5450 


T 


c 


5995 
•J ^ ^ 'J 


G 


a 


u ^ ^ ^ 


G 


A 


847 9 


c 




10045 


c 




10045 


G 




11994 


G 


A 


14070 


A 


G T 


15535 


T 




1 7618 


P 


X n 


1 8520 


a 




18525 




X n 


18525 




G A 


19189 

X 7 X V ^ 


T 


C A 


19259 


c 


T 


19325 


G 




19346 


G 




20845 




T 

X 


20845 


T 

X 


c 


22234 


T 


c 


22234 


G 


X 


22247 

Cm ^ t 1 


c 


X 


22334 


A 
t\ 




23033 










A 






(Z 




i. 


C 


26407 


c 


A 


26473 


c 


T 


26844 


G 


A 


28384 


A 




28417 


A 


C 


29265 


A 


G 


29484 


A 


G 


30417 


T 




30783 


C 


G 



Context : 



DNA 

Position 

267 CCAGCCTCTCTTAGGCTCCTAAATATAGTGCAAAAAGTTCCAGAGTTCCTTTGTTACCCA 
TGAAAGCACATGGAACGGTGCTGGACAGGGGCAACTGGCCCTGGAGCAGAGGAGTAACTG 
CATAGAACTGTCCAAGCCTCAGAGGGAGTCACACCACCAGCAAGAACCTGGGTGGGAGTA 
GGTGAGCCAAGGGGTTCCCAGGCTCTGACCCTGCCAAGAGAACTCATTAGAAGGTCACCA 
ACCACACATACTATTCCTCGGTCTCA 
IT,C] 

GAAGAACCCAGGGACCGGACCAGGCAAGATATCACAAAGCTGT^GTTTCAGCTCTGGGGC 
AGAGCATGGATCTGAGGTCTTTGGCCCTACCACCATGCGATCATATGAGGGCCATCATAC 
AACCATCATGATTTGGGGGAGGAATAGGGCATAGAGGAATCATATGAAAAGCTGAAATGC 
CATGAGTTACCCAGAAGAAGCTGTGTAAGCCAGAGGATTCTGAGACCCTGTCAAATAACA 
ACATCTAGTTGAAGGTTGGAGTTAGGTAGGAGGTAGGGAAGTCTGGGAAAGAAGGAGCTG 

284 CCAGCCTCTCTTAGGCTCCT AAATATAGTGCAAAAAGTTCCAGAGTTCCTTTGTTACCC A 

TGAAAGCACATGGAACGGTGCTGGACAGGGGCAACTGGCCCTGGAGCAGAGGAGTAACTG 
CATAGAACTGTCCAAGCCTCAGAGGGAGTCACACCACCAGCAAGAACCTGGGTGGGAGTA 
GGTGAGCCAAGGGGTTCCCAGGCTCTGACCCTGCCAAGAGAACTCATTAGAAGGTCACCA 
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ACCACACATACTATTCCTCGGTCTCATGAAGAACCCAGGGACC 
(GrA] 

GACCAGGCAAGATATCACAAAGCTGAAGTTTCAGCTCTGGGGCAGAGCATGGATCTGAGG 
TCTTTGGCCCTACCACCATGCGATCATATGAGGGCCATCATACAACCATCATGATTTGGG 
GGAGGAATAGGGCATAGAGGAATCATATGAAAAGCTGAAATGCCATGAGTTACCCAGAAG 
AAGCTGTGTAAGCCAGAGGATTCTGAGACCCTGTCAAATAACAACATCTAGTTGAAGGTT 
GGAGTTAGGTAGGAGGTAGGGAAGTCTGGGAAAGAAGGAGCTGAAACACTTGCTGTGTGT 

1269 CCTGTCTTAATCACTTACCCGCCAAATAAAATCTGGCTCCAGAGAGTGGAGCGTAGGCTT 
AAGGAATTGGGGGCGGAAGGGCGGGGAAGGTGGGGGAGGGACAGTGATAGGGAGAACAGG 
GAATTGTAGCAGAAATTGGGTTTATTGTTCAGAGCTGTCAATGAACACTTAACATATGCC 
TGTCTTAGCCTAAATCAATGAATAAATGAATGAATAAATAAATGAATGAAATGTGGGCAA 
TGCCTATAAAGATTGCTGGGACAGGGAGGTGGGGGGAGACACCAGCTTGGGAAGTCAGGC 
[T,C] 

TGTTAGATCCTAGTTCACCACCTGATACGTTACAAATACTAAAACCATCACTTTCAAATT 
ATTTTTACTACATTTTCCTGTTATCTGTACTCGAGTTTATTTATGTTTCTGGCATCTAGA 
GTCAGCCCTTCATGGGCATGAGACCCAAGCAGCCACACGAGGCTCTGAACCCAGAAGAGC 
ATATGCTCGGTTTAATGGTCTGTCATCTTAGAATTGTTAATAMGTTTTTATCCCGCATT 
TTCATTTTGCACTGAGATTCATAAATTATATAGCAGGCCCTGACTGTACCTGTATAGTGG 

2487 AGCCATGGAATTCTCCTGGCTGGAGACGCGCTGGGCGCGGCCCTTTTACCTGGCGTTCGT 
GTTCTGCCTGGCCCTGGGGCTGCTGCAGGCCATTAAGCTGTACCTGCGGAGGCAGCGGCT 
GCTGCGGGACCTGCGCCCCTTCCCAGCGCCCCCCACCCACTGGTTCCTTGGGCACCAGAA 
GGTAAATGGAAGGGAAAAAGGNTAGAAAAGGAGGAAGAGGGGGGCGGAGGAGGATGCGGC 
AGAGGAGCCCAGCCGGCAGAGAGACGCAGCTTTCTTCCATCCCTGGGGACCCTCCGGCTT 
(T,C,G1 

CACCGGCCTTTCCAGCCCGGCCTGTGGCTCTTAGCATCATTTTTCCTTGCTCTGGAGAAT 
TGCTTTCCCGCAGCCCCACAGGGAAAGGTCACAAAAGAGGAAGCTTTGGGGGCTGGGAGA 
GAGCTATTTAAAGAACCTGAATATGGAAAAAGAAAGCGAGCTGTAACTCAAGTCTGTCTC 
TCATTGCTTCACCAAGCCTTCCACATGTGTTGCTTTAAAAATAGCATGTTATTCTAAATA 
ACTTATTAGTTGCAGAAAATATGCAAAATCTATCCCAATCGTTGGCACCCTTAGTCCATT 

4 486 TGTTATGTATCTCTACTGTCTCATGAATACTATGTCGTCTGTTGTTTTAATTGAATTGTT 
TTGGCATCCTTGTCAAAAATCAATTGACCATAAATGTCAAGGTCTATTTCTGAGTCTTCA 
ATTCTAATCCATTGATCTATATGTCTATCCTAACTCATGGACACAGAGAGTAGAAGGATG 
GTTACCAAAGGCTGGGAAGGATAGAGGGGAGCTGGGGGAGGAGGTAGGGAAGGTTAATGG 
GTACAAAAAAAATAGAAAGAATGA 
[G,A] 

TAACACCTACTATTTGATAGCATAGCAGGGTGGCTATAGTCAATAATAACTGTACACTTT 
TAAATAAAGAGTGTAATAGGATTGTTTGCAACTCAATGGATAAATGCTTGAGGGGATGGG 
TACdCCATTCTTCATGATGTGCCTATTTCACATTGCATGCCTGTATCAAAAACATCTCAT 
TTACTCCATAAATATATACACCTACTATGTATCCACAAGTATTAAAAATTATAAATAAAT 
AAATTATATAGCTATCCTTATGCTAGTACCACACTGCCTTACTGTTGCTTTGTAGTAAGC 

4522 TGTTATGTATCTCTACTGTCTCATGAATACTATGTCGTCTGTTGTTTTAATTGAATTGTT 
TTGGCATCCTTGTCAAAAATCAATTGACCATAAATGTCAAGGTCTATTTCTGAGTCTTCA 
ATTCTAATCCATTGATCTATATGTCTATCCTAACT.CATGGACACAGAGAGTAGAAGGATG 
GTTACCAAAGGCTGGGAAGGATAGAGGGGAGCTGGGGGAGGAGGTAGGGAAGGTTAATGG 
GTACAAAAAAAATAGAAAGAATGAATAACACCTACTATTTGATAGCATAGCAGGGTGGCT 
[G,A] 

TAGTCAATAATAACTGTACACTTTTAAATAAAGAGTGTAATAGGATTGTTTGCAACTCAA 
TGGATAAATGCTTGAGGGGATGGGTACCCCATTCTTCATGATGTGCCTATTTCACATTGC 
ATGCCTGTATCAAAAACATCTCATTTACTCCATAAATATATACACCTACTATGTATCCAC 
AAGTATTAAAAATTATAAATAT^TAAATTATATAGCTATCCTTATGCTAGTACCACACTG 
CCTTACTGTTGCTTTGTAGTAAGCTTTGAAATCAGGAAGTATGAGTCCCCCGCACTTTGG 

4 522 TGTTATGTATCTCTACTGTCTCATGAATACTATGTCGTCTGTTGTTTTAATTGAATTGTT 
TTGGCATCCTTGTCAAAAATCAATTGACCATAAATGTCAAGGTCTATTTCTGAGTCTTCA 
ATTCTAATCCATTGATCTATATGTCTATCCTAACTCATGGACACAGAGAGTAGAAGGATG 
GTTACCAAAGGCTGGGAAGGATAGAGGGGAGCTGGGGGAGGAGGTAGGGAAGGTTAATGG 
GTACAAA/VAAAATAGAAAGAATGAATAACACCTACTATTTGATAGCATAGCAGGGTGGCT 
[C,A] 

TAGTCAATAATAACTGTACACTTTTAAATAAAGAGTGTAATAGGATTGTTTGCAACTCAA 
TGGAT7WVTGCTTGAGGGGATGGGTACCCCATTCTTCATGATGTGCCTATTTCACATTGC 
ATGCCTGTATCAAAAACATCTCATTTACTCCATAAATATATACACCTACTATGTATCCAC 
AAGTATTAAAAATTATAAATAAATAAATTATATAGCTATCCTTATGCTAGTACCACACTG 

FIGURE 3, page 12 of 19 



wo 02/34922 



17/23 



PCTAJSO 1/42528 



CCTTACTGTTGCTTTGTAGTAAGCTTTGAAATCAGGAAGTATGAGTCCCCCGCACTTTGG 

5075 TTTGTAGTAAGCTTTGAAATCAGGAAGTATGAGTCCCCCGCACTTTGGTATTTTCCAAGA 
TTATTTTGGCTGTTTGGAATCCTTGATTTCTATACAAATTTTAGACTCAGCCTATCAATT 
TCTACAAGGAAACCAGCTAGGGTTCTGCTTGGGATTGCACTGAATCTGTAGATCAGTTTG 
GGGATTATTGCCATCTTAAGAATATTAGGTCTTCTGATCCATGAACACAGAAAGCCTTTC 
CGTTTAGTTAGGTCATCTTTAATTTTTTTTGTTGTTTTTTTTTGTTTTTTGAGACAGAGT . 
[T,G,C] 

CTGCTCTGTCGCCCAGGCTGGAGTGCAGTGACGCAATCTCGGCTCACTGCAACCTCCGCC 
TCTCGGATTCAAGCGATTCTCCTGCCTCAGCCTCCCAAGCAGCTGGGACTACAGGCACAT 
GCCACCACACCAACTAATTTTTGTATTTTCAGTAGAGACGGGGTTTCACCATATTGGCCA 
GGCTAGTCTCGAACTCCTGACCTCGTGATCCACCCGCCTCACCCTCCCAAAGTGCTGGGA 
TTACAGGCGTGAGCCACCACTCCCGGCTTTCTTTAATTTTTTTTAACGATGTTTTTGTAT 

5450 GATTCTCCTGCCTCAGCCTCCCAAGCAGCTGGGACTACAGGCACATGCCACCACACCAAC 
TAATTTTTGTATTTTCAGTAGAGACGGGGTTTCACCATATTGGCCAGGCTAGTCTCGAAC 
TCCTGACCTCGTGATCCACCCGCCTCACCCTCCCAAAGTGCTGGGATTACAGGCGTGAGC 
CACCACTCCCGGCTTTCTTTAATTTTTTTTAACGATGTTTTTGTATTTTTCAAAGTATAC 
ATCTTGCATTTCTTTTGTTAAATTTATTTGTTTTGTTCTTTTTAATTTCATTTCAGACTA 
(T,CJ 

TTATTGCATTCATAGTGTTTTAGAGTCCACATTCCCTCTTGACTGTCACTAAGTTTTTTT 
TTTTCTGTTTTTGAGAGGTTTCTATCAGAATTTTGCAGATCAGAGATGACGGACATGTCA 
AACTGTCTAATATTACCAACCCTCCCCATTTATCAGATCAGGATCCTTTTGGTGATTCAC 
CATGCAGGGAAATCTAGTATCTAAGGCTCAAAAGGTGATACTGTTTTACATAGGCAGTAA 
CATTTTATTGCTACATAATAACTACATATTTATGGAGTACCTGTGATATTTTGATACGTG 

5450 GATTCTCCTGCCTCAGCCTCCCAAGCAGCTGGGACTACAGGCACATGCCACCACACCAAC 
TAATTTTTGTATTTTCAGTAGAGACGGGGTTTCACCATATTGGCCAGGCTAGTCTCGAAC 
TCCTGACCTCGTGATCCACCCGCCTCACCCTCCCAAAGTGCTGGGATTACAGGCGTGAGC 
CACCACTCCCGGCTTTCTTTAATTTTTTTTAACGATGTTTTTGTATTTTTCAAAGTATAC 
ATCTTGCATTTCTTTTGTTAAATTTATTTGTTTTGTTCTTTTTAATTTCATTTCAGACTA 
[T,C] 

TTATTGCATTCATAGTGTTTTAGAGTCCACATTCCCTCTTGACTGTCACTAAGTTTTTTT 
TTTTCTGTTTTTGAGAGGTTTCTATCAGAATTTTGCAGATCAGAGATGACGGACATGTCA 
' AACTGTCTAATATTACCAACCCTCCCCATTTATCAGATCAGGATCCTTTTGGTGATTCAC 
CATGCAGGGAAATCTAGTATCTAAGGCTCAAAAGGTGATACTGTTTTACATAGGCAGTAA 
CATTTTATTGCTACATAATAACTACATATTTATGGAGTACCTGTGATATTTTGATACGTG 

5995 TTATTGCTACATAATAACTACATATTTATGGAGTACCTGTGATATTTTGATACGTGCATA 
CAATGTGCAGTGATCAAATCAGGGTGTTTAGGGTATTCATCACTTCTAACATTTATTATT 
TATTTGTGTTTGGAACATTTCAAGTCTCTTCAAGCTCTTCAGAAATATTCAATACATTAT 
TGTTAACAGTGCTATTGAACACTGGAACTTATTCCTTCTATCTAAAGACAGTAACATTTT 
AAGTATAGTCATAAGGTTACAGAAGGATAAAGTGTGTATAGGGAAAATTCCCTACAAGAT 
[G,A] 

AGAATTTCATTCCTTACTCTTAGTAATACAGGTCTTCAAACATGCCAAGGATATTCCTCC 
CTTGGAGCTTTGAACATGCACGTCTGTGGTTATATTGCTCTCCCTGCAAATTATTCCTAA 
AAGAGGCTTGCCCTGACCATTCAGACTAAAATAGCACCTCTAGTACTCTCTATCTCCAAC 
CCTATTATTATTATCTTGGCCCTTATCACTCTCTGACACTATACTGTATACTCTTTTGCT 
TGTTCGTTTATTATCCACCACTAACTACAATATAAAATCTGTGAGAGGTAGGATCTTTGT 

6241 AGTCATAAGGTTACAGAAGGATAAAGTGTGTATAGGGAAAATTCCCTACAAGATGAGAAT 
TTCATTCCTTACTCTTAGTAATACAGGTCTTCAAACATGCCAAGGATATTCCTCCCTTGG 
AGCTTTGAACATGCACGTCTGTGGTTATATTGCTCTCCCTGCAAATTATTCCTAAAAGAG 
GCTTGCCCTGACCATTCAGACTAAAATAGCACCTCTAGTACTCTCTATCTCCAACCCTAT 
TATTATTATCTTGGCCCTTATCACTCTCTGACACTATACTGTATACTCTTTTGCTTGTTC 
[G,A] 

TTTATTATCCACCACTAACTACAATATAAAATCTGTGAGAGGTAGGATCTTTGTTTGCCA 
CTATAT^CCTAGTGCATGGTACAGTTCCTGGTGCATAATAGGTGCTCAATAAATCCTTTG 
TTGAATGCATAAATATATTAGGTGCTGAGAAAATTTATTTATTCAAAGATCAATTTACTG 
CATAGAATAGGCCAGGTGGTTTGACATTTATTCAATAGCCAACATATGGGACCTAGGATG 
TACATATGCAAGTGTGTGTGTGTATGTGTGTGTGCATCTGCATGTGTACTTGGATGTACT 

8479 AAAGCATGTTATGTCACTTCCAGAAAAGTCTCAGGCTCCTCTGCTTGTGTGACCTTATCA 
GGTCCTGAACTCAGCTTGTGTCTATAAGAGGGGACAGGTCCAGCTTGGCTGGCTAATTAC 
TTTTACTTTTTTCACTGCAGTTTATTCAGGATGATAACATGGAGAAGCTTGAGGAAATTA 
TTGAAAAATACCCTCGTGCCTTCCCTTTCTGGATTGGGCCCTTTCAGGCATTTTTCTGTA 
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10045 



10045 



11994 



14070 



15535 



17618 



TCTATGACCCAGACTATGCAAAGACACTTCTGAGCAGAACAGGTAAGAAGAGGGGGAAAG 
(C,T] 

TCTGGGACCTATTCCTCCTAGAAGTGAAATGCATAAAACCCATAGGCAAGATTCCAAAGC 
AAAGATTGGTTTGGGGCCTTTAAGAGACACAGCAGCAAGTATGGGGAGGTGACAGGTTTC 
CTACCAATACTGAAGGGGATTCCCATATCCTCCCCAGTCCCTTGTCTTGTTCAGGTATGC 
ATGGGCACGTTGAAGTCGGTATAACTTAAAGCCTAGCTGGCATTACCAGACTTGCCAGGC 
AAGGCTTCCCTTGGCCTCTGTGGGTTTTATGACTTCAGTGTCAGCAACACTTCCCACTCC 

TCTGCTTGACTCTGCAGATCCCAAGTCCCAGTACCTGCAGAAATTCTCACCTCCACTTCT 
TGGTATGTATGTGCAAATGAGAGGTATAACCCACTCTCATTCAAAGTCCCCTTTCCATAG 
TAGAGCATGCCAAAGAAACTGAAATCTGAATTCAAAAGCACAAAGAGTGCAAGGTAGAGC 
TATACTGAACGTTATCTAGGGGAAAGATTGAAGGGGAGCTCTAAGGTCAACACACCACCA 
CTTCCCAGAAAGCTTCTTCATCCGTTTCTCTCCCACAAAGTCTTATTCTCAAGGCAGCAG 
[C,A) 

TACATGAATCTGTCCCCTCTCTCTTTAAAACTACAGCCTTGGCCAGGCACAGTGACTCAT 
GCATGTAATCCCAGCACTTTGGGAGGCCAAGGTGGGAGGATCACTTGAGGTCAAGATTTC 
AAGACCAGCTGGGCCAACATGGTGAAATCCCATCTCTACTAAAAATACAAAAATTAGCCA 
GGCATGGTAGCATGTAGGCCTGTAGTCCCACTACTTGGGAGGCTGAGACATGAGAATCGC 
TTGAACCTAGGAGGTGG 

TCTGCTTGACTCTGCAGATCCCAAGTCCCAGTACCTGCAGAAATTCTCACCTCCACTTCT 

tggtatgtatgtgcaaatgagaggtataacccactctcattcaaagtcccctttccatag 

TAGAGCATGCCAAAGAAACTGAAATCTGAATTCAAAAGCACAAAGAGTGCAAGGTAGAGC 

tatactgaacgttatctaggggaaagattgaaggggagctctaaggtcaacacaccacca 

CTTCCCAGAAAGCTTCTTCATCCGTTTCTCTCCCACAAAGTCTTATTCTCAAGGCAGCAG 
[G,A] 

TACATGAATCTGTCCCCTCTCTCTTTAAAACTACAGCCTTGGCCAGGCACAGTGACTCAT 
GCATGTAATCCCAGCACTTTGGGAGGCCAAGGTGGGAGGATCACTTGAGGTCAAGATTTC 
AAGACCAGCTGGGCCAACATGGTGAAATCCCATCTCTACTAAAAATACAAAAATTAGCCA 
GGCATGGTAGCATGTAGGCCTGT 

GGTAAGTAAAGGGGGAAAGTGCTCTGTGCATTGCGAAATGCTCCCAGCAATGGACAGTAT 
TAGGTATGTGTTTTGTGGGCCATGAAAATAAAAAATCAGTTTCTAAAAATTTAACCAATG 
TACACGTACTTATTGAACAATAGGTGTCTGTAAAAAATTTGTTATGTTCTTTGAGTGATA 
ATATTAATAAAAAGATCTGGTCCTCTGTCTTAGATATATTTTGAGATTTTATGGCAGCAA 
ACCAAGTACCAAATGGTGATAGTTAGATAGTAAGTGCTGTAGATGTGTTTCATGGAGGGC 
[G,A3 

GGTCTGTACAAACCTACCCCAAAGTCTGAGGAAACT6AGAGGCTGAAGAAAAAGGCTGAC 
AGTTTCTTAAAAAGAAACATTCAATAGAGGCTTTCAAACAAAAACCAT 

GGTCAGGCTTTGCTGGGGGCAGCTCCCTGCAACAGCTCCTCTCCACACTTGCTCTGTTTC 
TCACTTTTGAATCCAAACGTTTTTGAAAATGTTCTGAGTTTATTTTAAAATGTGGCTATG 
GTGGTTGAGAGCAGTGGCAGGGTACCTAGCAAGTTTGGAATTGAAGTTGGAGGAAGCCCT 
GGGGTAAACCCCTTGTAATTATGGGTCTTGTGTCAATGATTGCTTTAATGGAACTCTGGT 
CTGTTTGAAAGCAGAGTTATGGTAATAATTGAAAAGCCGCAGATCTTTAACTCAGCCATT 
[A,G,T) 

ACCATATATGCAGTTTTCTCCATGCTCCTTCTCACTCCGCTGGGTGTATTTTTCCCTTCC 
TCGTGCCCTGTGTAAGCACATGGCTTATTTACTCATGTGATCTTTGGTTCCTGCTGGGTC 
AGGGTTGTCTCCATTAGATCATAAAAACAGGGCCAGGCAGGAGCCTTCAAATGAAGGCAA 
TTTGGTCATGGTGGTGGTGATGATGTTGGTCTTGACCTCCTGTGCCAGGATAAGTGGGAG 
AAGATTTGCAGCACTCAGGACACAAGCGTGGAGGTCTATGAGCACATCAACTCGATGTCT 

ACTTACTGCTTTGTTTCAGTGTCATCTATAAAATGGAGATTAAAAAAGAACCTATCTCAT 
ACATTTGTTGTTACGATGAGTGGGTTAATATATATAAAGCATTTAGGACAGTGCCTGGCA 
CTGAATAGATGTTAAATGTAAAGTATAGTTATGTCAAATGTCTTTGCTTCCAGGAATTTT 
GCAAGACACACCAACATATGCACACTTACACATACATATATGCATACATGCACATAGATA 
TTATAAAGAGGACACTCAGAGAAGCAGGTTATAAACAATTTAAGGCATAAATGGGCATTA 
[T,C] 

AAATAGCAGCAGTTCCCAAGTCTTTCTGCATCATTGCACACACAGAAAATGTTAATGTTT 
TTGTGCTTCATTGGAGTAAACAGGAATGGATTTGGGGGAAGCTATACAGAACTTTGTAAA 
T^AAAAATCTTTACTTTTTAAATATTATACAATTATGATGAAAAAGCAAAATGCAAAGTGT 
TAGGGAAAATATTAAATGTTAAATTTATTCAAAACTTAAAACCTTTTCAATTTTTTTTTT 
TTTTTTTTTTTGAGATGGAGTCTCTATCACTCAGGCTGGAGCGCAGTGGTGTGATCTCAG 

GGTAAGTGGGAATGGGATGGGGAGACAAGAATAAAACCGATTGACTAAATTTAACTGTAC 
TTTGAATTGATGAGCAGCTTCATGCAATTTGAGACAAAGAGAGAATTCTGCAACTGTGTC 
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GCTAGAGGAGGGTTAGTAAAGACTAAACGAACGATTTGACAAGATTTGAGGATTGTCATA 
TGGATACATGGATTTTAGGGCATCATGAAAAAATGGTCACATGGATAAACGTAAAAATTA 
TGATGATAAGGTCCTGGGAAATCTGGGAGTTTGAAGAGAATTTCTAGGGCCTGTTGATCG 
IC,T,A1 

GGGCCCTTTGTGCAAGGCCTGCTTTTCTTATCTAACCTTGGTTCTCCTTTATGCTTTGGG 
CAGAATATGGTTTATACCACATATTTGTTGAACTGAATTAAAATTTAAACCCCTATTTAA 
AGCTCTGATTTTTCCCCTCAAATCATTATTGTGGTTGTATCTCCAAACATTTATAAACTG 
GCATTTTATTTAAAATATTTGTATTGTACTTTCTAGGATGAAAGTGGTAGCAGCTTCTCA 
GATATTGATGTACACTCTGAAGTGAGCACATTCCTGTTGGCAGGACATGACACCTTGGCA 

18520 ATTTATCCATAAATTGTCTGTCATTGGTTTTCTAATCAATGGTGTGTGAAATGTCTTATT 
TCTTTATTTCACCTTGGCTCTGATGCATTGGAAATGAGGACTTGATCCCTGGGCTGGCAC 
TTAGAACTTAAACAATAGGGTCCAAGTGGAGCTCCTCTTCTGAGAGAGCTGAATGATTAG 
CTGCATTATTTAAGGCTCATTTTAGACATCTCCCAGCCGCTTGTCACCAATTTTATTCCT 
CAGGATTGATTTTAGACTTCAGACATAATATTCGATGATATATACTATAGTTAAGTTTAG 
[A,~,C] 

AAATATGGACTGAGGACATTTTAAATACTGAGACTTTTTTTATGACTACAATTTATTGTG 
GGCCCTGTCTTCGGTGAGCTAATGGTCTAATACAGGAGACAGGAGACAGACCTCCAAATT 
GCAGTGTAGCATAATGAGGGCAATGATAGAGATATGTGCTGGCTAACACAAAGACATAGA 
AGACAGGTACCTACCCTGGCATGGGAGCTCAAGGAGACTTCCTTGACATTTACGCTGACT 
GCAGGATAAGTAGGAGTTAGCCAGGTGGAAACTGTCATCTCTATCTTGCTAGACTTTAAG 

18525 TCCATAAATTGTCTGTCATTGGTTTTCTAATC7VATGGTGTGTGAAATGTCTTATTTCTTT 
ATTTCACCTTGGCTCTGATGCATTGGAAATGAGGACTTGATCCCTGGGCTGGCACTTAGA 
ACTTAAACAATAGGGTCCAAGTGGAGCTCCTCTTCTGAGAGAGCTGAATGATTAGCTGCA 
TTATTTAAGGCTCATTTTAGACATCTCCCAGCCGCTTGTCACCAATTTTATTCCTCAGGA 
TTGATTTTAGACTTCAGACATAATATTCGATGATATATACTATAGTTAAGTTTAGCAAAT 
[-,T,A] 

TGGACTGAGGACATTTTAAATACTGAGACTTTTTTTATGACTACAATTTATTGTGGGCCC 
TGTCTTCGGTGAGCTAATGGTCTAATACAGGAGACAGGAGACAGACCTCCAAATTGCAGT 
GTAGCATAATGAGGGCAATGATAGAGATATGTGCTGGCTAACACAAAGACATAGAAGACA 
GGTACCTACCCTGGCATGGGAGCTCAAGGAGACTTCCTTGACATTTACGCTGACTGCAGG 
ATAAGTAGGAGTTAGCCAGGTGGAAACTGTCATCTCTATCTTGCTAGACTTTAAGCATAT 

18525 TCCATAAATTGTCTGTCATTGGTTTTCTAATCAATGGTGTGTGAAATGTCTTATTTCTTT 
ATTTCACCTTGGCTCTGATGCATTGGAAATGAGGACTTGATCCCTGGGCTGGCACTTAGA 
ACTTAAACAATAGGGTCCAAGTGGAGCTCCTCTTCTGAGAGAGCTGAATGATTAGCTGCA 
TTATTTAAGGCTCATTTTAGACATCTCCCAGCCGCTTGTCACCAATTTTATTCCTCAGGA 
TTGATTTTAGACTTCAGACATAATATTCGATGATATATACTATAGTTT^GTTTAGCAAAT 
[-/G,A] 

TGGACTGAGGACATTTTAAATACTGAGACTTTTTTTATGACTACAATTTATTGTGGGCCC 
TGTCTTCGGTGAGCTAATGGTCTAATACAGGAGACAGGAGACAGACCTCCAAATTGCAGT 
GTAGCATAATGAGGGCAATGATAGAGATATGTGCTGGCTAACACAAAGACATAGAAGACA 
GGTACCTACCCTGGCATGGGAGCTCAAGGAGACTTCCTTGACATTTACGCTGACTGCAGG 
ATAAGTAGGAGTTAGCCAGGTGGAAACTGTCATCTCTATCTTGCTAGACTTTAAGCATAT 

19189 CTGGTCAAAGGGACAGAAAGACAGAAATGCT7\AGGACAATTCAGCAGCAGACCAGATAAA 
AAACACCATATTTCATATGCAAAAGTCAACTCAATTGAAACATTTGTAAAACCAAATTTG 
ACATTATAAAAGTATATCAGAGATCTCATTTTATAAGGAAATAGAAGCCCTTTCCTACCA 
TAAACTAAAGATTTAATCTATATAGCACAAAATACAATGTTGAGTAATCATTTTTAATTT 
ATTTTTTAACTGACAAAAATTGTGCATATACATGTTATATATATATGTATGTGTGTATAT 
CT,C,A] 

TATATGATGTACAACATGATATTTTGATATATGTATACACTGTGGAATGACTAAATCTAT 
CAATGGACATGTTCATTAACTCATACTTATCATTTTTTTGTGGTAAGGACATTTAAAATC 
TACCCTCTTAGCAATTTTCAAGTATACAAATTGTTAGTAACTCCAATCACATATTGTACA 
ATGCATCTCCTAAACTTATGCCTCCTGTCTGACTGAAATTTTGTATCCTTTGACTAACAT 
CCCTGTAATCCCCCATTCTCCCACAGCCCCTGGTAACCACTGTTCTACTCTCTGCTTCTT 

19259 TTTCATATGCAAAAGTCAACTCAATTGAAACATTTGTAAAACCAAATTTGACATTATAAA 
AGTATATCAGAGATCTCATTTTATAAGGAAATAGAAGCCCTTTCCTACCATAAACTAAAG 
ATTTAATCTATATAGCACAAAATACAATGTTGAGTAATCATTTTTAATTTATTTTTTAAC 
TGACAAAAATTGTGCATATACATGTTATATATATATGTATGTGTGTATATATATATGATG. 
TACAACATGATATTTTGATATATGTATACACTGTGGAATGACTAAATCTATCAATGGACA 
(C,T] 

GTTCATTAACTCATACTTATCATTTTTTTGTGGTAAGGACATTTAAAATCTACCCTCTTA 
GCAATTTTCAAGTATACAAATTGTTAGTAACTCCAATCACATATTGTACAATGCATCTCC 

FIGURE 3, page 15 of 19 



wo 02/34922 



PCTAJSOl/42528 



20/23 



19325 



19346 



20845 



20845 



22234 



22234 



TAAACTTATGCCTCCTGTCTGACTGAAATTTTGTATCCTTTGACTAACATCCCTGTAATC 
CCCCATTCTCCCACAGCCCCTGGTAACCACTGTTCTACTCTCTGCTTCTTTGAGTTTAAT 
GTTTTAGATTTCCACATGTGAGATCATGTGGAATTTGTCTTTCTGTGCCTGGCTTATTTC 

TCAGAGATCTCATTTTATAAGGAAATAGAAGCCCTTTCCTACCATAAACTAAAGATTTAA 
TCTATATAGCACAAAATACAATGTTGAGTAATCATTTTTAATTTATTTTTTAACTGACAA 
AAATTGTGCATATACATGTTATATATATATGTATGTGTGTATATATATATGATGTACAAC 
ATGATATTTTGATATATGTATACACTGTGGAATGACTAAATCTATCAATGGACATGTTCA 
TTAACTCATACTTATCATTTTTTTGTGGTAAGGACATTTAAAATCTACCCTCTTAGCAAT 
(G,T) 

TTCAAGTATACAAATTGTTAGTAACTCCAATCACATATTGTACAATGCATCTCCTAAACT 
TATGCCTCCTGTCTGACTGAAATTTTGTATCCTTTGACTAACATCCCTGTAATCCCCCAT 
TCTCCCACAGCCCCTGGTAACCACTGTTCTACTCTCTGCTTCTTTGAGTTTAATGTTTTA 
GATTTCCACATGTGAGATCATGTGGAATTTGTCTTTCTGTGCCTGGCTTATTTCACTTAG 
CATAATGTCATCCAAATTCATCTCTGTTGTCATAAATGACAAGATATTTGTCTTTTCTAT 

GAAATAGAAGCCCTTTCCTACCATAAACTAAAGATTTAATCTATATAGCACAAAATACAA 
TGTTGAGTAATCATTTTTAATTTATTTTTTAACTGACAAAAATTGTGCATATACATGTTA 
TATATATATGTATGTGTGTATATATATATGATGTACAACATGATATTTTGATATATGTAT 
ACACTGTGGAATGACTAAATCTATCAATGGACATGTTCATTAACTCATACTTATCATTTT 
TTTGTGGTAAGGACATTTAAAATCTACCCTCTTAGCAATTTTCAAGTATACAAATTGTTA 
[G,T3 

TAACTCCAATCACATATTGTACAATGCATCTCCTAAACTTATGCCTCCTGTCTGACTGAA 
ATTTTGTATCCTTTGACTAACATCCCTGTAATCCCCCATTCTCCCACAGCCCCTGGTAAC 
CACTGTTCTACTCTCTGCTTCTTTGAGTTTAATGTTTTAGATTTCCACATGTGAGATCAT 
GTGGAATTTGTCTTTCTGTGCCTGGCTTATTTCACTTAGCATAATGTCATCCAAATTCAT 
CTCTGTTGTCATAAATGACAAGATATTTGTCTTTTCTATGGCTAATTGTTAGTCCATTGT 

TGTTACTGGAACCTTTGTAGATCAGTTGACAATAAATGTGTGGGTGTATTTCTGGACTCT 
TTATCCTGTTTTATTAGTTTATATGTCTCTTTTTTTAGAAGCTCTATGCTGTTTTGGTGA 
CTAGAGCTCTGTAGTCAATTTCAGATCAGGTAGTATGATGCACTCCAGCTTTGCTCTTTT 
TGCTCAAAATTGCTTTGGCTATTTGAGTTTTTTTATTCCATACGAATTTTAGGGCTTTTT 
TfTTTTTTCGATTACTGTGAATAATGCCATTGGAATTTTGATGGAGATTGCATTGAATCT 

TGGGTAGTATGGATATTTTAACAGTATTAATGCTTCCAATTAATGAACACAGGGTATTTT 
GCAATTTGTGTTTTCTTCAATTTCTTTCACCAGTGTTTTTTTCTTAATTTAATTGTTTTA 
TTTCCATAGGGTTTGGGTAACAGGTGGTGTTTGGTTATGAGTAAGTTCTTTAGTGGTGAT 
TTGTGAGATTTTGATGCACCCATCACCTAAGCAGTATACACTGTACCCAATTTGTAGTCT 
TGTATCCCTCACCTCCCTCCCACCATTTCCCCCAAGTCCCCAAAGTCCATTGTATCATTC 

TGTTACTGGAACCTTTGTAGATCAGTTGACAATAAATGTGTGGGTGTATTTCTGGACTCT 
TTATCCTGTTTTATTAGTTTATATGTCTCTTTTTTTAGAAGCTCTATGCTGTTTTGGTGA 
CTAGAGCTCTGTAGTCAATTTCAGATCAGGTAGTATGATGCACTCCAGCTTTGCTCTTTT 
TGCTCAAAATTGCTTTGGCTATTTGAGTTTTTTTATTCCATACGAATTTTAGGGCTTTTT 
TTTTTTTTCGATTACTGTGAATAATGCCATTGGAATTTTGATGGAGATTGCATTGAATCT 
(T,C] 

TGGGTAGTATGGATATTTTAACAGTATTAATGCTTCCAATTAATGAACACAGGGTATTTT 
GCAATTTGTGTTTTCTTCAATTTCTTTCACCAGTGTTTTTTTCTTAATTTAATTGTTTTA 
TTTCCATAGGGTTTGGGTAACAGGTGGTGTTTGGTTATGAGTAAGTTCTTTAGTGGTGAT 
TTGTGAGATTTTGATGCACCCATCACCTAAGCAGTATACACTGTACCCAATTTGTAGTCT 
TGTATCCCTCACCTCCCTCCCACCATTTCCCCCAAGTCCCCAAAGTCCATTGTATCATTC 

AGAAACTTTTTAGTTTAATTAAGTCCCACCTATTTATCTTTTCGTTGTTGTTGTTTTTTG 
GGGTTGTTTTGTTTTGGCTTGGTTTTGCATCTGCTTTTGGGTTCTTGGTCATGAAGTCTT 
TGCCTAAGCCAATATCTAGAAGGGTTTTTCTGATGTTCTAGAATTTTTATGGTTCAGGTC 
TTAGATTTAAGTCCTTGATCCATCTTGAGTTGATTTTTGTATAAGGTGAGAGATGAGGAT 
CCAGTTTCATGCTTCTACATGTGGCTTGCCAATTATCCCAGTACAATTTGTTGAATAGGG 
[T,C] 

TAATATTTAAAGCTTTATATATTTAGGTGTTCCTATTTTGGGTACATATTTATTTACAAC 
TATCATATCCTCCTGATGGATTGACCCCTTTCTCATTATATAATGGTCTTCTTGTCTCTT 
TTTACAGTTTTTGTCTTAAAGCCTAATTTGTCTGATAAAAGTTCAGCTACCTTTGCTCTC 
TTTTGGTTTCTATTTGCATGGAATATTTTTTTCCAACCCTTCGCATTCACTCTATGTGTG 
TTCTTAAAGATGAAATGAGATGCTGTAGGGGCATATGCTTGGGTCTTGTTTTATTCATTC 

AGAAACTTTTTAGTTTAATTAAGTCCCACCTATTTATCXTTTCGTTGTTGTTGTTTTTTG 
GGGTTGTTTTGTTTTGGCTTGGTTTTGCATCTGCTTTTGGGTTCTTGGTCATGAAGTCTT 
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TGCCTAAGCCAATATCTAGAAGGGTTTTTCTGATGTTCTAGAATTTTTATGGTTCAGGTC 
TTAGATTTAAGTCCTTGATCCATCTTGAGTTGATTTTTGTATAAGGTGAGAGATGAGGAT 
CCAGTTTCATGCTTCTACATGTGGdTTGCCAATTATCCCAGTACAATTTGTTGAATAGGG 
[G,T) 

TAATATTTAAAGCTTTATATATTTAGGTGTTCCTATTTTGGGTACATATTTATTTACAAC 
TATCATATCCTCCTGATGGATTGACCCCTTTCTCATTATATAATGGTCTTCTTGTCTCTT 
TTTACAGTTTTTGTCTTAAAGCCTAATTTGTCTGATAAAAGTTCAGCTACCTTTGCTCTC 
TTTTGGTTTCTATTTGCATGGAATATTTTTTTCCAACCCTTCGCATTCACTCTATGTGTG 
TTCTTAAAGATGAAATGAGATGCTGTAGGGGCATATGCTTGGGTCTTGTTTTATTCATTC 

222 Al TTTAATTAAGTCCCACCTATTT ATCTTTTCGTTGTTGTTGTTTTTTGGGGTTGTTTTGTT 

TTGGCTTGGTTTTGCATCTGCTTTTGGGTTCTTGGTCATGAAGTCTTTGCCTAAGCCAAT 
ATCTAGAAGGGTTTTTCTGATGTTCTAGAATTTTTATGGTTCAGGTCTTAGATTTAAGTC 
CTTGATCCATCTTGAGTTGATTTTTGTATAAGGTGAGAGATGAGGATCCAGTTTCATGCT 
TCTACATGTGGCTT6CCAATTATCCCAGTACAATTTGTTGAATAGGGTTAATATTTAAAG 
(C,T] 

TTTATATATTTAGGTGTTCCTATTTTGGGTACATATTTATTTACAACTATCATATCCTCC 
TGATGGATTGACCCCTTTCTCATTATATAATGGTCTTCTTGTCTCTTTTTACAGTTTTTG 
TCTTAAAGCCTAATTTGTCTGATA7U\AGTTCAGCTACCTTTGCTCTCTTTTGGTTTCTAT 
TTGCATGGAATATTTTTTTCCAACCCTTCGCATTCACTCTATGTGTGTTCTTAAAGATGA 
AATGAGATGCTGTAGGGGCATATGCTTGGGTCTTGTTTTATTCATTCATTCAGCCACCCT 

22334 GTTCTTGGTCATGAAGTCTTTGCCTAAGCCAATATCTAGAAGGGTTTTTCTGATGTTCTA 
GAATTTTTATGGTTCAGGTCTTAGATTTAAGTCCTTGATCCATCTTGAGTTGATTTTTGT 
ATAAGGTGAGAGATGAGGATCCAGTTTCATGCTTCTACATGTGGCTTGCCAATTATCCCA 
GTACAATTTGTTGAATAGGGTTAATATTTAAAGCTTTATATATTTAGGTGTTCCTATTTT 
GGGTACATATTTATTTACAACTATCATATCCTCCTGATGGATTGACCCCTTTCTCATTAT 
IA,G3 

TAATGGTCTTCTTGTCTCTTTTTACAGTTTTTGTCTTAAAGCCTAATTTGTCTGATAAAA 
GTTCAGCTACCTTTGCTCTCTTTTGGTTTCTATTTGCATGGAATATTTTTTTCCAACCCT 
TCGCATTCACTCTATGTGTGTTCTTAAAGATGAAATGAGATGCTGTAGGGGCATATGCTT 
GGGTCTTGTTTTATTCATTCATTCAGCCACCCTTTTGATTAGAGAATTTAATTCATTTGT 
ATTCAAGGTAATTATTGACAGACAAGGACTTACTACTGCCATTTTGTTAATTGTTTTCTT 

23033 ATCTTTTGTTGCTCTACTATAGGTTTTTGCTTTGTGGTTACCATGAGGGTTACATAAAGC 
ATAGTTATAAAAGGCTATTTTAAACTGATAACAGCTTAACTTTCAACACTTAAAAAAACT 
ATACACTTTTACTCTACCAACTGCCCTCCATTTTATGTCTTTGATGTCATAATTTACCTA 
GTTTTGGAGATGTGTCCCCTTATTGTGTATCCCTTAACAAATTATTGTAGCAACAGTCAT 
TTTTAATAGTTTTGGCTTTTAACTTTATACTAGAGATAGAATTAATTAACATACCACCAC 
[T,-] 

ACATTATTAGGGTATTCTAAATTGACTATGTATTTACCTTTATCAGTGAGATTTTTGTTT 
TCAATTTTCATGTTGTTAATTAGTATTCTTTCATTTCAACTTGGAGAATTCACATTAGCA 
TTTTTTGTAAGATGGGTCTAGTAGTGGTGAACACCCTCAACTTTTGTTTATCTGGAGATG 
TCTTTACCTCTGCTTCATTTTGAAATATAACTTTTGTTCCATGATTGAAATGGACAAAAT 
TGTTTTTTTAATTATGCAAAGTGCCAGGGTAAGCAGAATTACTCTTTTTTTTTTTTTCTG 

23036 TTTTGTTGCTCTACTATAGGTTTTTGCTTTGTGGTTACCATGAGGGTTACATAAAGCATA 
GTTATAAAAGGCTATTTTAAACTGATAACAGCTTAACTTTCAACACTTAAAAAAACTATA 
CACTTTTACTCTACCAACTGCCCTCCATTTTATGTCTTTGATGTCATAATTTACCTAGTT 
TTGGAGATGTGTCCCCTTATTGTGTATCCCTTT^CATVATTATTGTAGCAACAGTCATTTT 
TAATAGTTTTGGCTTTTAACTTTATACTAGAGATAGAATTAATTAACATACCACCACTAC 

TTATTAGGGTATTCTAAATTGACTATGTATTTACCTTTATCAGTGAGATTTTTGTTTTCA 
ATTTTCATGTTGTTAATTAGTATTCTTTCATTTCAACTTGGAGAATTCACATTAGCATTT 
TTTGTAAGATGGGTCTAGTAGTGGTGAACACCCTCAACTTTTGTTTATCTGGAGATGTCT 
TTACCTCTGCTTCATTTTGAAATATAACTTTTGTTCCATGATTGAAATGGACAAAATTGT 
TTTTTTAATTATGCAAAGTGCCAGGGTAAGCAGAATTACTCTTTTTTTTTTTTTCTGAGA 

23421 CTTTCATTTCAACTTGGAGAATTCACATTAGCATTTTTTGTAAGATGGGTCTAGTAGTGG 
TGAACACCCTCAACTTTTGTTTATCTGGAGATGTCTTTACCTCTGCTTCATTTTGAAATA 
TAACTTTTGTTCCATGATTGAAATGGACAAAATTGTTTTTTTAATTATGCAAAGTGCCAG 
GGTAAGCAGAATTACTCTTTTTTTTTTTTTCTGAGACCGAGTTTCACTCTTGTTGCCCAG 
GCTGGAGTGCAGTGGCGCAATCTCTCAGCTTACCGCAACCTCTGCCTCCCAGGTTCAAGC 
[A,G] 

ATTCTTCTGCCTCAGCCTTCCTGAGTAGCTGGGATTACAGGCATGCACCACCATGCTCGG 
CTAATTTTGCATTTTTAGTAGAGACGGGGTTTCTCCATGTTGGTCAGGCTGGTCTTGAAC 
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ACCCGACCTCAGATGATCCGCCCACCTAGGCCTCCCAAAGTGCTGGGATTGCAGGTGTGA 
GCCACTGCGCCTGGCCAGAATTACTCTTATTTATCCTGAGCTTGAGGAAGAAAGAATTCA 
AAATTAAAATTTCACATTACCTAATGGCCAAAGCCTGCATTCAAAATAAGTAATCAGAAA 

25582 CCCAAAGGCACATAGCCAGTTGCAGCAAAGCTAAGCCCAGAATCCATGTCTCTGGAATCC 
CAGCCCAGGGTCTCTTCCATTGTGGGACATCATTTCTAAGATAATCTTTGTTTGGCTGAG 
TTTGAGACCGAGCTGAAACTTCATGGAAAATAGCACCAGCATCTTTATCTGAAAGACCAA 
GGGGGATCTTTGGCCTCATCATCATAATATCACCCTTATAAATATACAACATTTAATAGT 
TAATATAGAGCCTTCAGACCCATTATCTCATTTTTCCCCTTGGAATCCAATGTTAACAGA 
(T,C) 

GCTTATACAATGATTTACAGTTCACTGAACACTTTTAAGTACTTTCAATGTGGCCCAAAA 
TCCAGAGGCAGCCCCAATGTGTAGATGACATTAACTGATGTGAGCAGAGCTAGAACTTGT 
GCGGAGACCCTGAGTCTGGAGCCTAGAGTTCTTCGGAACAACACAGGTTTCTGAGCAGGG 
CTTATAGGAAGCAGAGGGGTCATGTGAGACATATTATCTGATTCAATGTTCTATTAATTC 
ATGTCTTAGGAAGCAAGCCAACAGGATTGCTTCTGGCAAACACCTACAGCCTGTTACTGT 

26407 CCTCTCAAGACAGAGTCTTGCTATGTAGCCCAGGCTGGAGTACAGTGGCGTAATCTCGGC 
TCACTGCAACCTCTGCCTCCCAGGTTTAAGCAGTTCTCCTGCCTCAGCCTCCCGAGTAGC 
TGGGATTACAGGTGCACACCACGCCTGGCAAATTTTTGTATTTTTATTAGAGATGGGGTT 
TCACCATGTTGGCCAGGCTAGTCTCAAGCTCCTGATCTCGAGACCAGCCCTCCTCAGCCT 
CCCAAAGCGCTGGGACTACAGCCATGAGCCACTGCACCCAGCCAGTTCTGTGCTTTTATA 
[C,A3 

CTAAATTGTCTCCAGGAGTGCTTAATAGTCCATTAATAGGTATTTAGGCCAGGCACAGTG 
GCTGACGCATATAATCCCAATATTTTGTGACACCAAGGTGGGAAGACTGCTTGAAGTTAG 
GAGTCTGAGACTAGCCTGGGCAACATAGGGAGACCCTGTCTTTACAAAAAAAAAAAAGAG 
AGAGATAGCCAGGCATGGTGTTGCATGCTTGTATTCCTGCCTACTTGGGGGACTGAGGCA 
GGAGGATCACTTGAGCTCAGAAGTTCAAGGTTACCGTGAGCAATGTTCACGCCACTGCTC 

26473 CAACCTCTGCCTCCCAGGTTTAAGCAGTTCTCCTGCCTCAGCCTCCCGAGTAGCTGGGAT 
TACAGGTGCACACCACGCCTGGCAAATTTTTGTATTTTTATTAGAGATGGGGTTTCACCA 
TGTTGGCCAGGCTAGTCTCAAGCTCCTGATCTCGAGACCAGCCCTCCTCAGCCTCCCAAA 
GCGCTGGGACTACAGCCATGAGCCACTGCACCCAGCCAGTTCTGTGCTTTTATACCTAAA 
TTGTCTCCAGGAGTGCTTAATAGTCCATTAATAGGTATTTAGGCCAGGCACAGTGGCTGA 
IC,T1 

GCATATAATCCCAATATTTTGTGACACCAAGGTGGGAAGACTGCTTGAAGTTAGGAGTCT 
GAGACTAGCCTGGGCAACATAGGGAGACCCTGTCTTTACAAAAAAAAAAAAGAGAGAGAT 
AGCCAGGCATGGTGTTGCATGCTTGTATTCCTGCCTACTTGGGGGACTGAGGCAGGAGGA 
TCACTTGAGCTCAGAAGTTCAAGGTTACCGTGAGCAATGTTCACGCCACTGCTCTCCAGC 
CTGATTGACAGGCCAGACCCT6ACTCTAAACAAAAACAAAAAACAAATATTTAAGTAATT 

26844 TGGGCAACATAGGGAGACCCTGTCTTTACAAAAAAAAATU^GAGAGAGATAGCCAGGCAT 
GGTGTTGCATGCTTGTATTCCTGCCTACTTGGGGGACTGAGGCAGGAGGATCACTTGAGC 
TCAGAAGTTCAAGGTTACCGTGAGCAATGTTCACGCCACTGCTCTCCAGCCTGATTGACA 
GGCCAGACCCTGACTCTAAACAAAAACAAAAAACAAATATTTAAGTAATTTCCAAACATA 
GCAGAAAATATAAGCATGGTTTATCACTTTGATATGACACCAACAGCTACTTAAGATAGA 
[G,A] 

TCATGAATTCAGTAAATTGTTGTGTGGAAAGCTAAGGTGCCAACCCAAGCCGCATCTTCT 
TAGGTGCTCCTCACTGGTGTCATCAGCTACAGCAGGCAGAGCATTGCCAGGAGCTAGCTC 
TTCCCTTCAAGAACAAAAGTCTTGTTTAAGAGCACAGTAGCCCACAACTTGCTCTTTCTC 
CTGCAGTCTCTTTTATTTCCCTCCTTTCTTAGGGATCACCGTGGTTCTTAGTATTTGGGG 
TCTTCACCACAACCCTGCTGTCTGGAAAAACCCAAAGGTATGATTCTCTCTTGTACATAA 

28384 CTTCCAGGGAACCGTAG ATCTTGGTGCCT ATTTG AGCCCCAAAGG ATC AGTT AGTTTT AC 

AAAGGACAATCGTATTCTCTGTCACATCCTTTTTGGCCATGCCTCAAAAGCAGTCCCACA 
ATGTAAGCTACTGCTCATAGGCTCAATGCAGTCCACCTTCAAAGCAAGAGAAATAATTTC 
ATGAGTAACTCCAACTGCCGCCTTGTTATAGGGAAGGCATCATGTTGGAGCCTCCCAGCT 
CAAATTCTCACAGTGAACAATTTAAGTCTAAAGTTCAAAAGTTTCAATGGCATTTGGTGG 
[A,-] 

AAAAATATCACTTTACTGTGTACTTCAGACTTCTTGTACTAGTATTTTACTATAGTCAGA 
AGAAACATCATTTTTTCAAGTATCACTTTCTTTCCCTCTTGTCTTCAGGAACTGCATTGG 
GCAGGAGTTTGCCATGATTGAGTTAAAGGTAACCATTGCCTTGATTCTGCTCCACTTCAG 
AGTGACTCCAGACCCCACCAGGCCTCTTACTTTCCCCAACCATTTTATCCTCAAGCCCAA 
GAATGGGATGTATTTGCACCTGAAGAAACTCTCTGAATGTTAGATCTCAGGGTACAATGA 

28417 GAGCCCCAAAGGATCAGTTAGTTTTACAAAGGACAATCGTATTCTCTGTCACATCCTTTT 
TGGCCATGCCTCAAAAGCAGTCCCACAATGTAAGCTACTGCTCATAGGCTCAATGCAGTC 
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CACCTTCAAAGCAAGAGAAATAATTTCATGAGTAACTCCAACTGCCGCCTTGTTATAGGG 
AAGGCATCATGTTGGAGCCTCCCAGCTCAAATTCTCACAGTGAACAATTTAAGTCTAAAG 
TTCAAAAGTTTCAATGGCATTTGGTGGAAAAAATATCACTTTACTGTGTACTTCAGACTT 
(A,C1 

TTGTACTAGTATTTTACTATAGTCAGAAGAAACATCATTTTTTCAAGTATCACTTTCTTT 
CCCTCTTGTCTTCAGGAACTGCATTGGGCAGGAGTTTGCCATGATTGAGTTAAAGGTAAC 
CATTGCCTTGATTCTGCTCCACTTCAGAGTGACTCCAGACCCCACCAGGCCTCTTACTTT 
CCCCAACCATTTTATCCTCAAGCCCAAGAATGGGATGTATTTGCACCTGAAGAAACTCTC 
TGAATGTTAGATCTCAGGGTACAATGATTAAACGTACTTTGTTTTTCGAAGTTAAATTTA 

TATGCAAGTAATAAGTGCATGTATGCTCACTGTCAAAAATTCCCAACACTAGAAAATCAT 
GTAGAATAAAAATTTTAAATCTCACTTCACTTAGCCGACATTCCATGCCCTGACCAATCC 
TACTGCTTTTCCTAAAAACAGAATAATTTGGTGTGCATTCTTTCAGACTTTTTCCTATAC 
ATTTTATATGTAGAAATGTAGCAATGTATTTGTATAGATGTGATCATTCCTATATTGTTA 
TTGATTTTTTTCACTTAATAAAAATTCACCTTATTCCTTATCATTGCTTTATGGTATTCT 
[A,G] 

TAATATGAATGTACTATAATTTATTTAACTATTTTCCTTATTGGGCATTTAAGTTATTTC 
TAGTTTTAAAAACATGCTTGTCAATGGCAACAAAAGCCAAAATTGACAAATGGGATCTAA 
TTAAACTAAAGAGCTTCTGCACAGCAAAACAAACTACCATCACACTGAATGGGCAGCCTA 
CAGAATGGGAGAAAATTTTTGCAACCTACTCATCTGACAAAGGCCTAATATCCAGAATCT 
ACAATGAACTCAAACAAATGTACAAGAAAAAAACAACCCCATCAAAAAGTGGGTGAAGGA 

GTGATCATTCCTATATTGTTATTGATTTTTTTCACTTAATAAAAATTCACCTTATTCCTT 
ATCATTGCTTTATGGTATTCTGTAATATGAATGTACTATAATTTATTTAACTATTTTCCT 
TATTGGGCATTTAAGTTATTTCTAGTTTTAAAAACATGCTTGTCAATGGCAACAAAAGCC 
AAAATTGACAAATGGGATCTAATTAAACTAAAGAGCTTCTGCACAGCAAAACAAACTACC 
ATCACACTGAATGGGCAGCCTACAGAATGGGAGAAAATTTTTGCAACCTACTCATCTGAC 
IA,G] 

AAGGCCTAATATCCAGAATCTACAATGAACTCAAACAAATGTACAAGAAAAAAACAACCC 
CATCAAAAAGTGGGTGAAGGATATGAACAGACACTTCTCAAAAGAAGACATTTACGCAGC 
CAAAAGACACATGAAAAAATGCCTATCGTCACTGGCCATCAGAGAAATGCAAATCAAAAC 
CACAATGAGATACCATCTCACACCAGTTAGAATGGCAATCATTAAAAAGTCAGGAAACAA 
CAGGTGCTGGAGAGGATGTGGAGAAATAGGAAGACTTTTACACTGTTGGTGGCAGGAGAA 

ATTCATCTACCTAAAATCTATATATAAAAAAATCCCTCCCTTGAATTCCAGATCCTTGGA 
GACAAACACCCACGTCTAAAACCAAATTTGTTTAACACTGGACCAGTCGTCCTGTGTGAC 
TTTCCATTTTGTCACTATTTTGTCAGCTGGTATACCAATATCCACCCAGTTAAACAATAT 
TTCCTTGTTTTTTTCTGGTACAAACCCAAATAAATTACAAACATCAATAAAAGTAAAATT 
CTAAAATAACTCACTTTCTCTATATATCTCCTTCTTGCTGGAAAAATGGGTTAGGTTAGT 

CTTTAAAAGCATGCATGATAAATTGTACTGAATACAATATTCAGGTCTGGACATACTAGG 
TATAATTTTCTGTGTCTCTGGGGTCTTACCTATTTGGGGTCAAAATAAACAAGTTTATTA 
AGCTTATTAATATTCAATTTCATTATCTTCTTTAACAATTATGTTCCCTGGTAGTTTCAT 
TGCCAATAATTTATTTGTCAGGTTGCCAGGTGCTTCTAAACTTCTGTGTATTTTTTCATA 
TCCAATTTTACTTTAAATATTTTTAGAAAAGAGGTCTGTTAAATTTCCTAATAATTATTA 

TTTTCTGTGTCTCTGGGGTCTTACCTATTTGGGGTCAAAATAAACAAGTTTATTAAGCTT 
ATTAATATTCAATTTCATTATCTTCTTTAACAATTATGTTCCCTGGTAGTTTCATTGCCA 
ATAATTTATTTGTCAGGTTGCCAGGTGCTTCTAAACTTCTGTGTATTTTTTCATATCCAA 
TTTTACTTTAAATATTTTTAGAAAAGAGGTCTGTTAAATTTCCTAATAATTATTATATTA 
TTGTTTTTTCACTGACATTTTGTGAATTGAAAACCCTTAAAAATATGAAATCATTTTTTC 
[C,G] 

AAATATGTGCCACAGACAATTTTGTTAAATAAGAAGACAGAAACAGGGCATTATCAAGAG 
ATAAATATTCAATATACCTTATATTTCTGTCACACATTTTTATACCAACTGTGCCAAAAA 
TTGTATATCATATAAATGATAACAAGTTCACAAAGGCATTCCTTTATCCCTTAACTCTCA 
AATTAGAAACTTTCATAGGTAGGAAGTAGGGGAAGCATATATTCCCTTTGAAAGGTGCAA 
GAAAATGTCATTGGCATTCACCATGGTACTCTTCAAGCTTAAAAAAAATGGACTGCAAAA 
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SEQUENCE LISTING 

<110> PE CORPORATION (NY) 

<120> ISOLATED HUMAN DRUG^METABOLIZING 

PROTEINS, NUCLEIC ACID MOLECULES ENCODING HUMAN 
DRUG-METABOLIZING PROTEINS, 
AND OSES THEREOF 

<130> CL000897PCT 

<140> TO BE ASSIGNED 
<141> 2001-10-05 

<150> 60/241,745 
<151> 2000-10-20 

<150> 09/739,456 
<151> 2000-12-19 

<150> 09/818, 647 
<151> 2001-03-28 

<150> 09/852,067 
<151> 2001-05-10 



<160> 4 

<170> FastSEQ for Windows Version 4.0 
<210> 1 
<211> 2327 
<212> DNA 
<213> Human 

<400> 1 

cgcgcctgcc tcctctcccc aggcctgagc tgcccctccc actgcctttc cttcttcccg 60 

cgagtcagaa gcttcgcgag ggcccagaga ggcggtgggg tgggcgaccc tacgccagct 120 

ccgggcggga gaaagcccac cctctcccgc gccccaggaa accgccggcg ttcggcgctg 180 

cgcagagcca tggaattctc ctggctggag ac'gcgctggg cgcggccctt ttacctggcg 24 0 

ttcgtgttct gcctggccct ggggctgctg caggccatta agctgtacct gcggaggcag 300 

cggctgctgc gggacctgcg ccccttccca gcgcccccca cccactggtt ccttgggcac 360 

cagaagttta ttcaggatga taacatggag aagcttgagg aaattattga aaaataccct 420 

cgtgccttcc ctttctggat tgggcccttt caggcatttt tctgtatcta tgacccagac 480 

tatgcaaaga cacttctgag cagaacagat cccaagtccc ggtacctgca gaaattctca 540 

cctccacttc ttggaaaagg actagcggct ctagacggac ccaagtggtt ccagcatcgt 600 

cgcctactaa ctcctggatt ccattttaac atcctgaaag catacattga ggtgatggct 660 

cattctgtga aaatgatgct ggataagtgg gagaagattt gcagcactca ggacacaagc 720 

gtggaggtct atgagcacat caactcgatg tctctggata taatcatgaa atgcgctttc 780 

agcaaggaga ccaactgcca gacaaacagc acccatgatc cttatgcaaa agccatattt 840 

gaactcagca aaatcatatt tcaccgcttg tacagtttgt tgtatcacag tgacataatt 900 

ttcaaactca gccctcaggg ctaccgcttc cagaagttaa gccgagtgtt gaatcagtac 960 

acagatacaa taatccagga aagaaagaaa tccctccagg ctggggtaaa gcaggataac 
1020 

actccgaaga ggaagtacca ggattttctg gatattgtcc tttctgccaa ggatgaaagt 
1080 

ggtagcagct tctcagatat tgatgtacac tctgaagtga gcacattcct gttggcagga 
1140 

catgacacct tggcagcaag catctcctgg atcctttact gcctggctct gaaccctgag 
1200 
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catcaagaga gatgccggga ggaggtcagg ggcatcctgg gggatgggtc ttctatcact 
1260 

tgggaccagc tgggtgagat gtcgtacacc acaatgtgca tcaaggagac gtgccgattg 
1320 

attcctgcag tcccgtccat ttccagagat ctcagcaagc cacttacctt cccagatgga 
1380 

tgcacattgc ctgcagggat caccgtggtt cttagtattt ggggtcttca ccacaaccct 
1440 

gctgctgtct ggaaaaaccc aaaggtcttt gaccccttga ggttctctca ggagaattct 
1500 

gatcagagac acccctatgc ctacttacca ttctcagctg gatcaaggaa ctgcattggg 
1560 

caggagtttg ccatgattga gttaaaggta accattgcct tgattctgct ccacttcaga 
1620 

gtgactccag accccaccag gcctcttact ttccccaacc attttatcct caagcccaag 
1680 

aatgggatgt atttgcacct gaagaaactc tctgaatgtt agatctcagg gtacaatgat 
1740 

taaacgtact ttgtttttcg aagttaaatt tacagctaat gatccaagca gatagaaagg 
1800 

gatcaatgta tggtgggagg attggaggtt ggtgggatag gggtctctgt gaagagatcc 
1860 

aaaatcattt ctaggtacac agtgtgtcag ctagatctgt ttctatataa ctttgggaga 
1920 

ttttcagatc ttttctgtta aactttcact actattaatg ctgtatacac caatagactt 
1980 

tcatatattt tctgttgttt ttaaaatagt tttcagaatt atgcaagtaa taagtgcatg 
2040 

tatgctcact gtcaaaaatt cccaacacta gaaaatcatg tagaataaaa attttaaatc 
2100 

tcacttcact tagccgacat tccatgccct gaccaatcct actgcttttc ctaaaaacag 
2160 

aataatttgg tgtgcattct ttcagacttt ttcctataca ttttatatgt agaaatgtag 

2220 

caatgtattt gtatagatgt gatcattcct atattgttat tgattttttt cacttaataa 
2280 

aaattcacct tattccttaa aaaaaaaaaa aaaaaaaaaa aaaaaaa 
2327 

<210> 2 

<211> 510 

<212> PRT 

<213> Human 

<400> 2 



Met 


Glu 


Phe 


Ser 


Trp 


Leu 


Glu 


Thr 


Arg 


Trp 


Ala 


Arg 


Pro 


Phe 


Tyr 


Leu 


1 








5 










10 










15 




Ala 


Phe 


Val 


Phe 
20 
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Ala 
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Leu 
55 
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Gin 
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Phe 
60 
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Leu 
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Glu 
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Glu 
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Tyr 
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95 
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Lys 
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Lys 
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Lys 
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Pro 


Leu 
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Leu 
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Lys 


Gly 


Leu 
125 


Ala 


Ala 


Leu 
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Gly 


Pro 


Lys 


Trp 


Phe 


Gin 


His 


Arg 


Arg 


Leu 


Leu 


Thr 


Pro 


Gly 


Phe 
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<213> Human 
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<223> n « A, T,C or G 
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ccagcctctc ttaggctcct aaatatagtg caaaaagttc cagagttcct ttgttaccca 60 
tgaaagcaca tggaacggtg ctggacaggg gcaactggcc ctggagcaga ggagtaactg 120 
catagaactg tccaagcctc agagggagtc acaccaccag caagaacctg ggtgggagta 180 
ggtgagccaa ggggttccca ggctctgacc ctgccaagag aactcattag aaggtcacca 24 0 
accacacata ctattcctcg gtctcatgaa gaacccaggg accggaccag gcaagatatc 300 
acaaagctga agtttcagct ctggggcaga gcatggatct caggtctttg gccctaccac 360 
catgcgatca tatgagggcc atcatacaac catcatgatt tgggggagga atagggcata 420 
gaggaatcat atgaaaagct gaaatgccat gagttaccca caagaagctg tgtaagccag 480 
aggattctga gaccctgtca aataacaaca tctagttgaa ggttggagtt aggtaggagg 540 
tagggaagtc tgggaaagaa ggagctgaaa cacttgctgt gtgtggctta atggaacatg 600 
caaggggcca ggacgaactt ggtccagatg aagtcaccac cccctggggc ctgtcttttt 660 
tttttttttt tttttttttt tgagacggag tctcactctg tcaccaggct ggagtgcagt 720 
ggcgcgatct cggctcactg caatctttgc ctctcgggtt caagcgattc tcctgcctca 780 
gcctcctgag tagctgggat tacaggcgcg cgccaccacg cccagctaat tttagtactg 840 
ttagtagaga tggggtttca ccatcttggc caggatggtc ttgatccctt gacctcgtga 900 
tccgcccgcc tcggcctccc aaattgctgg gattacaggc gtgagccacc gcgcccggcc 960 
ccctggagcc tgtcttaatc acttacccgc caaataaaat ctggctccag agagtggagc 
1020 

gtaggcttaa ggaattgggg gcggaagggc ggggaaggtg ggggagggac agtgataggg 
1080 

agaacaggga attgtagcag aaattgggtt tattgttcag agctgtcaat gaacacttaa 
1140 

catatgcctg tcttagccta aatcaatgaa taaatgaatg aataaataaa tgaatgaaat 
1200 

gtgggcaatg cctataaaga ttgctgggac agggaggtgg ggggagacac cagcttggga 
1260 

agtcaggcct gttagatcct agttcaccac ctgatacgtt acaaatacta aaaccatcac 
1320 

tttcaaatta tttttactac attttcctgt tatctgtact cgagtttatt tatgtttctg 
1380 

gcatctagag tcagcccttc atgggcatga gacccaagca gccacacgag gctctgaacc 
1440 

cagaagagca tatgctcggt ttaatggtct gtcatcttag aattgttaat aaagttttta 
1500 

tcccgcattt tcattttgca ctgagattca taaattatat agcaggccct gactgtacct 
1560 

gtatagtgga attactatat gatggtacgc tactgtgcat atcttccccg ttcagtgttc 
1620 

agtgccctcg tatcggcagc ttgaactagc tcatggtaca cgctgggaat cagggtggga 
1680 

atcagttgta aaccatttac cggaacacca ctaggcaggc cacaggataa aggaataatg 
1740 

atggtacacc tccccctacc tctaccacct gggaattttg gtagaatgcc agaatggaaa 
1800 

agaaaatctc ttgcatagcc atttataatt tgtgataagg aagaaaaaca atgacctcag 
1860 

ctittagcatt attttacaat ataaattcag atcccgtgac tgaaaactgt tggacttaaa 
1920 

agaggacgct ccaggagcgc aaaagcagtt gggccgaacg aagcgtgcgc gctttggtaa 
1980 

ccggctagaa atcccgcacg cgcgcctgcc tcctctcccc aggcctgagc tgcccctccc 
2040 

actgcctttc cttcttcccg cgagtcagaa gcttcgcgag ggcccagaga ggcggtgggg 
2100 

gtgggcgacc ctacgccagc tccgggcggg agaaagccca ccctctcccg cgccccatga 
2160 

aaccgccggc gttcggcgct gcgcagagcc atggaattct cctggctgga gacgcgctgg 
2220 

gcgcggccct tttacctggc gttcgtgttc tgcctggccc tggggctgct gcaggccatt 
2280 
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aagctgtacc 
2340 

acccactggt 
2400 

aagagggggg 
24 60 

ttccatccct 

2520 

gcatcatttt 
2580 

aaagaggaag 
2640 

aagcgagctg 
2700 

tttaaaaata 
2760 

cccaatcgtt 
2820 

ttgtgaagta 
2880 

taaagggaga 
2940 

taaggttttt 
3000 

caaagagaag 
3060 

agagaaaatg 
3120 

tctgtcggag 
3180 

agcagagtct 
3240 

cgatcctttg 
3300 

tttatcggtt 
3360 

ctattaattt 
3420 

caagcctcat 
3480 

gggtcattgg 
3540 

cactctctta 
3600 

tcccctctct 
3660 

tccactatga 
3720 

cttgtacagc 
3780 

gtattccttt 
3840 

acaggcaatc 
3900 

ttaaatagaa 
3960 

ctaaatggtc 
4020 

gtattgnnnn 
4080 



tgcggaggca 
tccttgggca 
cggaggagga 
ggggaccctc 
tccttgctct 
ctttgggggc 
taactcaagt 
gcatgttatt 
ggcaccctta 
aggagcagcc 
ctgttagctt 
attcattcaa 
ctaagtccct 
aaaatttaag 
gaaagtatac 
taaaggatgg 
gtttccccac 
tcttctggta 
aataagttta 
gttgaaattt 
gatggatccc 
gtccctcttc 
cttcctgtct 
gtggaagcag 
ctgcagaatt 
acagcaacac 
acttacactt 
aaacttctat 
ctctttcatt 
nnnnnnnnnn 



gcggctgctg 
ccagaaggta 
tgcggcagag 
cggcttgcac 
ggagaattgc 
tgggagagag 
ctgtctctca 
ctaaataact 
gtccatttta 
ccagccagcc 
ttggtctctc 
ccgactctga 
cccctgcacc 
gcaatgggtg 
atctccgcct 
ttgggtggtg 
tttctagtct 
tttaaatact 
gacatctgct 
gattcccaat 
tcatgaatgt 
aacccccaga 
ctcaccatgt 
tctgagatcc 
gtaacccaaa 
aaatgtacta 
catattccac 
ttgtattatt 
ttatttcctt 
nnnnnnnnnn 



cgggacctgc 
aatggaaggg 
gagcccagcc 
cggcctttcc 
tttcccgcag 
ctatttaaag 
ttgcttcacc 
tattagttgc 
acaagagaaa 
actcgagaaa 
ccgtttttta 
gtggcaattg 
acccaagtca 
ctttactaga 
agagaaggaa 
tggggaaggc 
ttcttatata 
tatttgtaaa 
gtggtttaga 
gttggaggtg 
cttggtgcag 
actgattgtt 
ggtctctgca 
tccgcagatg 
taatcctctt 
agacaacatc 
tgtcccagta 
tttattatgc 
ttctcataga 
nnnnnnnnnn 



gccccttccc 
aaaaaggnta 
ggcagagaga 
agcccggcct 
ccccacaggg 
aacctgaata 
aagccttcca 
agaaaatatg 
attttctttt 
tactgattga 
aatccactcc 
tgtgataggt 
ggtgcagact 
ggcctagaga 
ggaaagtctg 
attccagcag 
aagcaaccac 
atagtattac 
tatggtttgt 
ggatctgatg 
ctgtctcctt 
gaaaagagcc 
cacaactgct 
cagatgccaa 
tgtgaatgac 
cacctatgaa 
actatatagt 
aaatgttatt 
actttttccc 
nnnnnnnnnn 



agcgcccccc 
gaaaaggagg 
cgcagctttc 
gtggctctta 
aaaggtcaca 
tggaaaaaga 
catgtgttgc 
caaaatctat 
cctaagattc 
tggaaatttg 
cacccctaat 
actaagatta 
taggccacag 
caagggaata 
tgaagggctg 
agctactaca 
tttcaactct 
catattgcat 
tcgtccccac 
ggagatcttt 
cataagttct 
tgccacctcc 
cctgttcact 
tgccatgctt 
ccagcctcag 
cttctttatg 
attgtatttt 
tactgctgat 
cacccccaca 
nnnnnnnnnn 
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nnnnnnnnnn nnnnnnnnnn 
4140 

nnnnnnnnnn nnnnnnnnnn 
4200 

nnnnnnnnnn nnnnnnnnnn 
4260 

tgttgtttta attgaattgt 
4320 

aggtctattt ctgagtcttc 
4380 

gacacagaga gtagaaggat 
4440 

ggaggtaggg aaggttaatg 
4500 

tgatagcata gcagggtggc 
4560 

aataggattg tttgcaactc 
4620 

tgatgtgcct atttcacatt 
4680 

tatacaccta ctatgtatcc 
4740 

tccttatgct agtaccacac 
4800 

gtatgagtcc cccgcacttt 
4860 

tttctataca aattttagac 
4920 

gcttgggatt gcactgaatc 
4980 

aggtcttctg atccatgaac 
5040 

ttttgttgtt tttttttgtt 
5100 

cagtgacgca atctcggctc 
5160 

ctcagcctcc caagcagctg 
5220 

ttttcagtag agacggggtt 
5280 

tgatccaccc gcctcaccct 
5340 

gctttcttta atttttttta 
5400 

cttttgttaa atttatttgt 
54 60 

catagtgttt tagagtccac 
5520 

ttgagaggtt tctatcagaa 
5580 

tattaccaac cctccccatt 
5640 

aatctagtat ctaaggctca 
5700 

ctacataata actacatatt 
5760 

gcagtgatca aatcagggtg 
5820 

tgtttggaac atttcaagtc 
5880 



nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
ntgttatgta tctctactgt 
tttggcatcc ttgtcaaaaa 
aattctaatc cattgatcta 
ggttaccaaa ggctgggaag 
ggtacaaaaa aaatagaaag 
tatagtcaat aataactgta 
aatggataaa tgcttgaggg 
gcatgcctgt atcaaaaaca 
acaagtatta aaaattataa 
tgccttactg ttgctttgta 
ggtattttcc aagattattt 
tcagcctatc aatttctaca 
tgtagatcag tttggggatt 
acagaaagcc tttccgttta 
ttttgagaca gagtcctgct 
actgcaacct ccgcctctcg 
ggactacagg cacatgccac 
tcaccatatt ggccaggcta 
cccaaagtgc tgggattaca 
acgatgtttt tgtatttttc 
tttgttcttt ttaatttcat 
attccctctt gactgtcact 
ttttgcagat cagagatgac 
tatcagatca ggatcctttt 
aaaggtgata ctgttttaca 
tatggagtac ctgtgatatt 
tttagggtat tcatcacttc 
tcttcaagct cttcagaaat 



nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
ctcatgaata ctatgtcgtc 
tcaattgacc ataaatgtca 
tatgtctatc ctaactcatg 
gatagagggg agctggggga 
aatgaataac accbactatt: 
cacttttaaa taaagagtgt 
gatgggtacc ccattcttca 
tctcatttac tccataaata 
ataaataaat tatatagcta 
gtaagctttg aaatcaggaa 
tggctgtttg gaatccttga 
aggaaaccag ctagggttct 
attgccatct taagaatatt 
gttaggtcat ctttaatttt 
ctgtcgccca ggctggagtg 
gattcaagcg attctcctgc 
cacaccaact aatttttgta 
gtctcgaact cctgacctcg 
ggcgtgagcc accactcccg 
aaagtataca tcttgcattt 
ttcagactat ttattgcatt 
aagttttttt ttttctgttt 
ggacatgtca aactgtctaa 
ggtgattcac catgcaggga 
taggcagtaa cattttattg 
ttgatacgtg catacaatgt 
taacatttat tatttatttg 
attcaa1:aca ttattgttaa 
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cagtgctatt 
5940 

agtcataagg 
6000 

ttcattcctt 
6060 

agctttgaac 
6120 

gcttgccctg 
6180 

tattattatc 
6240 

gtttattatc 
6300 

actataaacc 
6360 

gttgaatgca 
6420 

gcatagaata 
6480 

gtacatatgc 
6540 

tgcagagaac 
6600 

agtttgaatc 
6660 

tctgtagggc 
6720 

ggttcttgat 
6780 

catagcagtg 
6840 

catttctgaa 
6900 

ttctctgggg 
6960 

cctttttgtc 
7020 

gtttatgaaa 
7080 

ggcaaaggcc 
7140 

acaataaata 

7200 

ttgaaaacca 
7260 

ttaacaaaaa 
7320 

cttgaactat 

7380 

gggagccaga 
7440 

gtgtgattgg 
7500 

tcccaccatc 
7560 

tccactttac 
7620 

tttaagtgag 
7680 



gaacactgga 

ttacagaagg 

actcttagta 

atgcacgtct 

accattcaga 

.ttggccctta 

caccactaac 

tagtgcatgg 

taaatatatt 

ggccaggtgg 

aagtgtgtgt 

atctatgtag 

ctcattagtg 

tcaatttcct 

gtaaatatta 

cttgataaat 

caatgtttac 

atgaagaaat 

tagtcaggag 

attttttacc 

ttgtcttcct 

cttgtgtgtt 

ctactctgga 

agtgcagtat 

ccagagaaca 

gttgtccacc 

tcacacgagg 

atgctcaagt 

acaagaagca 

ccattactca 



acttattcct 

ataaagtgtg 

atacaggtct 

gtggttatat 

ctaaaatagc 

tcactctctg 

tacaatataa 

tacagttcct 

aggtgctgag 

tttgacattt 

gtgtatgtgt 

ctaagtagta 

gttgccagct 

catctctaaa 

aataacatag 

gttcgctgtt 

taaatatatg 

ataaattaaa 

ttacaaaaag 

taaacaaaca 

tatatttctc 

tattgtttgt 

tagtcagtgg 

tttagaaact 

ctttatgggt 

tctccagagg 

aaaaatctgg 

gtgtagatgg 

aatgtaaatg 

tctgcttcta 



tctatctaaa 

tatagggaaa 

tcaaacatgc 

tgctctccct 

acctctagta 

acactatact 

aatctgtgag 

ggtgcataat 

aaaatttatt 

attcaatagc 

gtgtgcatct 

taaagcactt 

gtacacactt 

gtagggattg 

aacatggaaa 

gctatttggg 

tagtacccgt 

tatagtacag 

tataatgaaa 

attgtcatat 

tgtatctcta 

aaatgaataa 

gtgcttatca 

aggtttcaag 

taaaattgct 

atgagagcaa 

cagccttaag 

gcacaccaaa 

aatcttgttt 

aaagcaaaaa 



gacagtaaca 

attccctaca 

caaggatatt 

gcaaattatt 

ctctctatct 

gtatactctt 

aggtaggatc 

aggtgctcaa 

tattcaaaga 

caacatatgg 

gcatgtgtac 

gggctccaga 

gggcagatca 

taatcatatc 

gcatttagca 

ggcactatgc 

tttcaagtgt 

tattcacaac 

tactttcata 

tagtttacaa 

ccacctggta 

atgaaaaaat 

ctggcttgat 

actctcaacc 

aaatgataac 

acaatcctgc 

attactttgc 

acacacacat 

tcagtgattt 

ctccttctct 



ttttaagtat 

agatgagaat 

cctcccttgg 

cctaaaagag 

ccaaccctat 

ttgcttgttc 

tttgtttgcc 

taaatccttt 

tcaatttact 

gacctaggat 

ttggatgtac 

gttaaactgg 

tttaacctag 

tacttcatag 

gcacctagtt 

attttctgaa 

atttagatgc 

agttttctgt 

tggctggggt 

tattcatgag 

cgtgtgatag 

attcacattg 

tatggcaaca 

tttcagtggc 

agagaaaaat 

agcagatacc 

agcgggggac 

gcaggtgccc 

agagaaacaa 

ggtggtagta 
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tttgcactct 
7740 

attcacacat 
7800 

tagatagagt 
7860 

actggtgcat 
7920 

tgccccattt 
7980 

cccactccaa 
8040 

cctctattgg 
8100 

caggatatat 
8160 

agcttagaag 
8220 

gcttgtgtga 
8280 

gcttggctgg 
8340 

agaagcttga 
8400 

ttcaggcatt 
8460 

gtaagaagag 
8520 

ataggcaaga 
8580 

tggggaggtg 
8640 

ttgtcttgtt 
8700 

attaccagac 
8760 

cagcaacact 
8820 

cagcagtaac 
8880 

acccatcaga 
8940 

nnnnnnnnnn 
9000 

nnnnnnnnnn 
9060 

nnnnnnnnnn 
9120 

nnnnnnnnnn 

9180 

nnnnnnnnnn 
9240 

nnnnnnnnnn 
9300 

nnnnnnnnnn 
9360 

nnnnnnnnnn 
9420 

nnnnnnnnnn 
9480 



catttgtaaa 
ctgtgtaaat 
tggcaatttt 
aattagagag 
agtaactgtt 
tctgagtgtg 
gacaagagtt 
taatattgag 
tcccaaagaa 
ccttatcagg 
ctaattactt 
ggaaalztatt 
tttctgtatc 
ggggaaagct 
ttccaaagca 
acaggtttcc 
caggtatgca 
ttgccaggca 
tcccactcct 
tctacctctg 
cctcttgatn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 



tgttggaagc 
ggaccttctg 
ttagagagaa 
aggaacagga 
agtttcccac 
tgatgttggc 
cacagtaaat 
aagataaata 
agcatgttat 
tcctgaactc 
ttactttttt 
gaaaaatacc 
tatgacccag 
ctgggaccta 
aagattggtt 
taccaatact 
tgggcacgtt 
aggcttccct 
acccctggtc 
ctggttcagt 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 



tgaaagtttt 
ttgttggggg 
gcatttactg 
agaagaaatg 
ataggaaata 
aagtgaggca 
gtcattcaac 
cactaacttt 
gtcacttcca 
agcttgtgtc 
cactgcagtt 
ctcgtgcctt 
actatgcaaa 
ttcctcctag 
tggggccttt 
gaaggggatt 
gaagtcggta 
tggcctctgt 
tcgagcataa 
atgaaagcct 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 



gtatttgagt 
gagaatttgg 
ctaagtcatg 
gtgagctgga 
cttcttttta 
gagagtgtga 
agtgacttgg 
gtttagagaa 
gaaaagtctc 
tataagaggg 
tattcaggat 
ccctttctgg 
gacacttctg 
aagtgaaatg 
aagagacaca 
cccatatcct 
taacttaaag 
gggttttatg 
gtctcaagag 
gaatgctaga 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 



ttgctttaag 
attttcttta 
agaaataatc 
tgtagggtca 
gcttccagat 
ctcggctcac 
tctgggggta 
ttatccccca 
aggctcctct 
gacaggtcca 
gataacatgg 
attgggccct 
agcagaacag 
cataaaaccc 
gcagcaagta 
ccccagtccc 
cctagctggc 
acttcagtgt 
ggtgggaaat 
tcattaattt 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
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nnnnnnnnnn nnnnnnnnnn 
9540 

nnnnnnnnnn nnnnnnnnnn 
9600 

nnnnnnnnnn nnnnnnnnnn 
9660 

nnnnnnnnnn nnnnnnnnnn 

9720 

nnnnnnnnnn nnnnnnnnnn 
9780 

gcagaaattc tcacctccac 
9840 

tcattcaaag tcccctttcc 
9900 

agcacaaaga gtgcaaggta 
9960 

agctctaagg tcaacacacc 
10020 

aaagtcttat tctcaaggca 
10080 

gccttggcca ggcacagtga 
10140 

gaggatcact tgaggtcaag 
10200 

ctactaaaaa tacaaaaatt 
10260 

tgggaggctg agacatgaga 
10320 

attgtgccac tgcactccag 
10380 

acaaaaaaaa aactacccaa 
10440 

tctctcgttt tcttggatgt 
10500 

tcagtaaaat tttgctctag 

10560 

aattacataa ttggtattta 
10620 

gtattaggaa ccacttaaat 
10680 

tcttggggaa gtcatttaat 
10740 

gagactcaca ttgctgggct 
10800 

tgtaatggcc accattgtat 
10B60 

cattcaatga atggtatcaa 
10920 

cctaatgacc agtctggcaa 
10980 

atagtatcat tcatagacct 
11040 

ttgtctcttt ccttgtgaat 
11100 

aaggaattca cttgctctgt 
11160 

ttcactttca gttaacctcc 
11220 

ttattgagaa aagtaaacct 
11280 



nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnntctgct tgactctgca 
ttcttggtat gtatgtgcaa 
atagtagagc atgccaaaga 
gagctatact gaacgttatc 
accacttccc agaaagcttc 
gcagatacat gaatctgtcc 
ctcatgcatg taatcccagc 
atttcaagac cagctgggcc 
agccaggcat ggtagcatgt 
atcgcttgaa cctaggaggt 
actaggtgac agagcaaaac 
actgcagtct caccatccct 
tttcctttct ttttggagtt 
agtttggcaa tattctgtca 
tgttaaacaa gacatgaatg 
ttgaatcttg ccccctcctg 
ctctccctat ctcagtttcc 
gttatgagga ttaaatgaaa 
gagtgacaga tcatgcatca 
ttatgtatta ataaacttta 
tagaagattg tgaagcatta 
gggctcaagg aggaaatatc 
ttatgttcat catatagttt 
tactagtgtg agctagggag 
acagcaacac agggaaaaag 
caggaagatt gagtcactta 



nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
gatcccaagt cccagtacct 
atgagaggta taacccactc 
aactgaaatc tgaattcaaa 
taggggaaag attgaagggg 
ttcatccgtt tctctcccac 
cctctctctt taaaactaca 
actttgggag gccaaggtgg 
aacatggtga aatcccatct 
aggcctgtag tcccactact 
ggaggttgcc gtgagctcag 
tctgtccgca gcccccaaca 
attcttgttt tctttatcct 
cctttatttc cacatgcgag 
gcagataaac taagctcttt 
aaagaaaaga atataggctt 
cattgactag ttaaatatga 
tcatctttga caataaggat 
tacatatttt tagcactaca 
tgagcctgga atgttgtaag 
aagtcctttt aaagccaaat 
gccttggtaa gtatttccac 
aggggacaga gtggacactc 
atggattggt ttggagtgga 
taggttggct accttatgta 
gtatttagta tcatagttca 
ttcagttact acataggtag 
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taactggtga tttcaggatt 
11340 

tgaaactgtt tctcacaata 
11400 

ttcatctatt aaattgcatt 
11460 

atgactgcct gtgggtcatg 
11520 

tcatctaatt aaatggcata 
11580 

gctctagacg gacccaagtg 
11640 

aacatcctga aagcatacat 
11700 

aaagggggaa agtgctctgt 
11760 

gtgttttgtg ggccatgaaa 
11820 

acttattgaa caataggtgt 
11880 

taaaaagatc tggtcctctg 
11940 

accaaatggt gatagttaga 
12000 

tacaaaccta ccccaaagtc 
12060 

ttaaaaagaa acattcaata 
12120 

nnnnnnnnnn nnnnnnnnnn 
12180 

nnnnnnnnnn nnnnnnnnnn 
12240 

nnnnnnnnnn nnnnnnnnnn 
12300 

nnnnnnnnnn nnnnnnnnnn 
12360 

nnnnnnnnnn nnnnnnnnnn 
12420 

nnnnnnnnnn nnnnnnnnnn 
12480 

nnnnnnnnnn nnnnnnnnnn 
12540 

nnnnnnnnnn nnnnnnnnnn 
12600 

nnnnnnnnnn nnnnnnnnnn 
12660 

nnnnnnnnnn nnnnnnnnnn 
12720 

nnnnnnnnnn nnnnnnnnnn 
12780 

nnnnnnnnnn nnnnnnnnnn 
12840 

nnnnnnnnnn nnnnnnnnnn 
12900 

nnnnnnnnnn nnnnnnnnnn 
12960 

nnnnnnnnnn nnnnnnnnnn 
13020 

nnnnnnnnnn nnnnnnnnnn 
13080 



agcgtgctaa tcttataagg 
ttaaatacat ccatcccaga 
gcacattaat acgagtacta 
gttactccac gctgcctgtg 
aggttttctg ccttttattt 
gttccagcat cgtcgcctac 
tgaggtgatg gctcattctg 
gcattgcgaa atgctcccag 
ataaaaaatc agtttctaaa 
ctgtaaaaaa tttgttatgt 
tcttagatat attttgagat 
tagtaagtgc tgtagatgtg 
tgaggaaact gagaggctga 
gaggctttca aacaaaaacc 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 



ctttgaaatt tattagactt 
ggtaagcttc taaattcacc 
ctttgatact ccactgttgc 
ttcctcatct atccttcatc 
ctcaaggaaa aggactagcg 
taactcctgg attccatttt 
tgaaaatgat gctggtaagt 
caatggacag tattaggtat 
aatttaacca atgtacacgt 
tctttgagtg ataatattaa 
tttatggcag caaaccaagt 
tttcatggag ggcgggtctg 
agaaaaaggc tgacagtttc 
atnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
nnnnnnnnnn nnnnnnnnnn 
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nnnnnnnnnn 
13140 

nnnnnnnnnn 
13200 

nnnnnnnnnn 
13260 

nnnnnnnnnn 
13320 

nnnnnnnnnn 
13380 

nnnnnnnnnn 
13440 

nnnnnnnnnn 
13500 

nnnnnnnnnn 
13560 

nnnnnnnnnn 
13620 

nnnnnnnnnn 

13680 

nnnnnnnnnn 
13740 

nnnnnnnnnn 
13800 

acagctcctc 
13860 

ttctgagttt 
13920 

agtttggaat 
13980 

gtcaatgatt 
14040 

aaaagccgca 

14100 

ctcactccgc 
14160 

actcatgtga 
14220 

ggccaggcag 
14280 

cttgacctcc 
14340 

gaggtctatg 
14400 

aaggagacca 
14460 

acattttcta 
14520 

cttatgcaaa 
14580 

tgtatcacag 
14640 

gccgagtgtt 
14700 

tgccatgatt 
14760 

ctttgttatt 
14820 

ttataagact 
14880 



nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnntinnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
tccacacttg 
attttaaaat 
tgaagttgga 
gctttaatgg 
gatctttaac 
tgggtgtatt 
tctttggttc 
gagccttcaa 
tgtgccagga 
agcacatcaa 
actgccagac 
agttgtttat 
agccatattt 
tgacataatt 
gaatcagtac 
gtactgtgtc 
aatggagctt 
ttgcttcaac 



nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnng 
ctctgtttct 
gtggctatgg 
ggaagccctg 
aactctggtc 
tcagccattt 
tttcccttcc 
ctgctgggtc 
atgaaggcaa 
taagtgggag 
ctcgatgtct 
aaacaggt ca 
taacacatta 
gaactcagca 
ttcaaactca 
acaggtattt 
tgtctagagg 
ttatatagac 
catagcagta 



nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
gtcaggcttt 
cacttttgaa 
tggttgagag 
gggtaaaccc 
tgtttgaaag 
accatatatg 
tcgtgccctg 
agggttgtct 
tttggtcatg 
aagatttgca 
ctggatataa 
gtggtgggag 
tcccaacttt 
aaatcatatt 
gccctcaggg 
gttgggtttg 
gataaacctt 
actgctccaa 
ttatcagaat 



nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
gctgggggca 
tccaaacgtt 
cagtggcagg 
cttgtaatta 
cagagttatg 
cagttttctc 
tgtaagcaca 
ccattagatc 
gtggtggtga 
gcactcagga 
tcatgaaatg 
agcaaaaaag 
ctcttctagc 
tcaccgcttg 
ctaccgcttc 
ggttgcccac 
aatatgacaa 
agaaatttga 
ttttatatat 



nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
nnnnnnnnnn 
gctccctgca 
tttgaaaatg 
gtacctagca 
tgggtcttgt 
gtaataattg 
catgctcctt 
tggcttattt 
ataaaaacag 
tgatgttggt 
cacaagcgtg 
cgctttcagc 
atatttcttc 
acccatgatc 
tacagtttgt 
cagaagttaa 
gtccatacgc 
gagaaagaat 
cttgagtcct 
atatatatac 
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actattttta 
14940 

agacatacat 
15000 

aaacattcat 
15060 

aagaatattg 

15120 

gctttttaag 
15180 

cagtgaatta 
15240 

tgctttgttt 
15300 

gttgttacga 
15360 

agatgttaaa 
15420 

cacaccaaca 
15480 

agaggacact 
15540 

gcagcagttc 
15600 

cttcattgga 
15660 

atctttactt 
15720 

aaaatattaa 
15780 

ttttttgaga 
15840 

tacaacctcc 
15900 

attacaggca 
15960 

aaatatatca 
16020 

tctattccag 
16080 

ttgtcaagat 
16140 

gtagtagagc 
16200 

cagggccccc 
16260 

tttcttgtag 
16320 

atttagaatt 
16380 

ccagttgtcc 
16440 

cacattctaa 
16500 

ctcgtgtatt 
16560 

gattttgtgg 
16620 

gaaaaattca 
16680 



ttatggacaa 
ggaatatggc 
gcaacatagg 
attcaaaaac 
caattcaaca 
atccttcatc 
cagtgtcatc 
tgagtgggtt 
tgtaaagtat 
tatgcacact 
cagagaagca 
ccaagtcttt 
gtaaacagga 
tttaaatatt 
atgttaaatt 
tggagtctct 
acctcccagg 
ctgccaccac 
atattttatt 
ggtttctcaa 
gtgtgcattg 
tctgatagtt 
agttgagaac 
tacttgtata 
tattcctata 
catcaccatt 
acggatacat 
ccattgatct 
cttttttcaa 
gcagaaagta 



ttattattaa 
tttttgcaca 
aatggagagt 
agttttagca 
ttacttgtca 
agcttcacca 
tataaaatgg 
aatatatata 
agttatgtca 
tacacataca 
ggttataaac 
ctgcatcatt 
atggatttgg 
atacaattat 
tattcaaaac 
atcactcagg 
ttcaggcaat 
acctggctaa 
ttattgcatc 
ccctcagcac 
taggatgttt 
atagcaacca 
cactgccctg 
atttcattat 
tatggtgtga 
atttaaaagt 
gtactggtat 
atctaccaat 
cattaataga 
cagagagttc 



tacaaatata 
gcgattgcag 
ggaacagagt 
agcataaaca 
tgaatgccat 
cttactagca 
agattaaaaa 
aagcatttag 
aatgtctttg 
tatatgcata 
aatttaaggc 
gcacacacag 
gggaagctat 
gatgaaaaag 
ttaaaacctt 
ctggagcgca 
tctcctacct 
tttttttaaa 
tggattttta 
taatggcttc 
agctacatcc 
taaataactc 
tacccaggtt 
tttcatattt 
ggtattgatc 
ttatcttttc 
ctgttttgga 
gtaccagaat 
ccttattttt 
tcatattacc 



agtaggcact 

taataataat 

aaacatggac 

caaaagttga 

aatggagaat 

gttactagta 

agaacctatc 

gacagtgcct 

cttccaggaa 

catgcacata 

ataaatgggc 

aaaatgttaa 

acagaacttt 

caaaatgcaa 

ttcaattttt 

gtggtgtgat 

cagccttctg 

ttgtttattt 

gtaatcacaa 

ttagattaga 

ctgacatcta 

cagacattat 

gtagagaaaa 

aaatcagaga 

taatttttcc 

aagtgatttg 

taagagtata 

cacactgttt 

agaaaagttt 

catgtaacaa 



taagagttcc 
gacaagctaa 
atgcacccga 
aatagattaa 
acttatcaag 
agttacttac 
tcatacattt 
ggcactgaat 
ttttgcaaga 
gatattataa 
attataaata 
tgtttttgtg 
gtaaaaaaaa 
agtgttaggg 
tttttttttt 
ctcagctcac 
agtagctggg 
ttatttagtc 
aaagccattc 
taagtccttg 
cccactcgat 
tgaatgttcc 
ttatttatgt 
tctaaactcc 
aaatgtttat 
agataaccat 
tttggatgtt 
taattaagga 
taggtttgca 
acctgtacat 
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gtacccctgt 
16740 

gttccatatt 
16800 

tttctttgcc 
16860 

atattagact 
16920 

aattttattg 
16980 

agctttaatc 
17040 

taaacgtctg 
17100 

tttatttgac 
17160 

atccaggaaa 
17220 

aagtaccagg 
17280 

gcctgctcaa 
17340 

gacaagaata 
17400 

gcaatttgag 
17460 

taaacgaacg 
17520 

catgaaaaaa 
17580 

tgggagtttg 
17640 

ttttcttatc 
17700 

atttgttgaa 
17760 

tcattattgt 
17820 

attgtacttt 
17880 

tgagcacatt 
17940 

actgcctggc 
18000 

tgggggatgg 

18060 

tccccctgag 
18120 

atatgtgtag 
18180 

aaattattct 
18240 

cattggtttt 
18300 

gatgcattgg 
18360 

ccaagtggag 
18420 

ttagacatct 
18480 



atctaaaata 
tttgttttgt 
tgtcttctat 
gaggaagaaa 
cccacatact 
aaaagtccct 
gaaatcaggc 
tttgccatca 
gaaagaaatc 
attttctgga 
gtgaccagtt 
aaaccgattg 
acaaagagag 
atttgacaag 
tggtcacatg 
aagagaattt 
taaccttggt 
ctgaattaaa 
ggttgtatct 
ctaggatgaa 
cctgttggca 
tctgaaccct 
gtcttctatc 
attttgcttt 
gtgaaacaga 
cttgtctttc 
ctaatcaatg 
aaatgaggac 
ctcctcttct 
cccagccgct 



aaagttgaaa 
tttttttctc 
ttcattccat 
agaataattg 
gatggaaact 
ttgatgagaa 
aagatttgaa 
ctttggtaat 
cctccaggct 
tattgtcctt 
aattatgtaa 
actaaattta 
aattctgcaa 
atttgaggat 
gataaacgta 
ctagggcctg 
tctcctttat 
atttaaaccc 
ccaaacattit 
agtggtagca 
ggacatgaca 
gagcatcaag 
acttggtaag 
attttttgcg 
agaagtaggc 
aggaaaaaaa 
gtgtgtgaaa 
ttgatccctg 
gagagagctg 
tgtcaccaat 



ttttttaaat 
tcagctcctt 
tttatttaat 
gtcacttgca 
atgtttttta 
aataaaccat 
gctattcact 
tggaaactat 
ggggtaaagc 
tctgccaagg 
gtaggtgggt 
actgtacttt 
ctgtgtcgct 
tgtcatatgg 
aaaattatga 
ttgatcgagg 
gctttgggca 
ctatttaaag 
ataaactggc 
gcttctcaga 
ccttggcagc 
agagatgccg 
atctgcaccc 
ctggtacctt 
tacttttctg 
aaaaagttta 
tgtcttattt 
ggctggcact 
aatgattagc 
tttattcctc 



agtaaataaa 
caattataaa 
aacttttccg 
tctaaacttg 
tttgtgttgt 
ctgtgaaaat 
aaccatggct 
ttttctaccc 
aggataacac 
taaatcttct 
aagtgggaat 
gaatligatga 
agaggagggt 
atacatggat 
tgataaggtc 
gccctttgtg 
gaatatggtt 
ctctgatttt 
attttattta 
tattgatgta 
aagcatctcc 
ggaggaggtc 
ctaaattttc 
agtgacccta 
ttctttctaa 
tttatccata 
ctttatttca 
tagaacttaa 
tgcattattt 
aggattgatt 



tattacctct 
tatattggca 
tgaagataaa 
aaatcatctt 
ttatctttgg 
tagatctatt 
tgctttataa 
agatacaata 
tccgaagagg 
aaatttctaa 
gggatgggga 
gcagcttcat 
tagtaaagac 
tttagggcat 
ctgggaaatc 
caaggcctgc 
tataccacat 
tcccctcaaa 
aaatatttgt 
cactctgaag 
tggatccttt 
aggggcatcc 
ctgctagttt 
gtgcctcagg 
agagagctcc 
aattgtctgt 
ccttggctct 
acaatagggt 
aaggctcatt 
ttagacttca 
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gacataatat tcgatgatat 
18540 

ttaaatactg agactttttt 
18600 

aatggtctaa tacaggagac 
18660 

caatgataga gatatgtgct 
18720 

atgggagctc aaggagactt 
18780 

ccaggtggaa actgtcatct 
18840 

gcccaggtta tgctgtttgc 
18900 

acagaaagac agaaatgcta 
18960 

tcatatgcaa aagtcaactc 
19020 

tatatcagag atctcatttt 
19080 

ttaatctata tagcacaaaa 
19140 

acaaaaattg tgcatataca 
19200 

caacatgata ttttgatata 
19260 

ttcattaact catacttatc 
19320 

caattttcaa gtatacaaat 
19380 

aaacttatgc ctcctgtctg 
19440 

cccattctcc cacagcccct 
19500 

ttttagattt ccacatgtga 
19560 

cttagcataa tgtcatccaa 
19620 

tctatggcta attgttagtc 
19680 

ccagtgatgg acacttaagt 
19740 

aacatgggaa tgtagatgtc 
19800 

agaagtggaa ttgctgcatc 
19860 

acaattttcc atatggctgt 
19920 

tttctccaca tcctcaccaa 
19980 

gtgatgtctc attatggttt 
20040 

ttttaaatac ctgctggcca 
20100 

catttttaaa tctagttatt 
20160 

ttgaatatta accccttatc 
20220 

tgtctcttca ctatgttgat 
20280 



atactatagt taagtttagc 
tatgactaca atttattgtg 
^99^9^1^39^ cctccaaatt 
ggctaacaca aagacataga 
ccttgacatt tacgctgact 
ctatcttgct agactttaag 
aaagataaaa tgtgttcctg 
aggacaattc agcagcagac 
aattgaaaca tttgtaaaac 
ataaggaaat agaagccctt 
tacaatgttg agtaatcatt 
tgttatatat atatgtatgt 
tgtatacact gtggaatgac 
atttttttgt ggtaaggaca 
tgttagtaac tccaatcaca 
actgaaattt tgtatccttt 
ggtaaccact gttctactct 
gatcatgtgg aatttgtctt 
attcatctct gttgtcataa 
cattgtttat atatatacca 
tgatttctat atctgggcta 
tcttcaatgc actgatttca 
atatggtagt tctattttta 
actaatttac attccaacca 
catttgtctt tttggtaata 
taatttacgt ttccctgatg 
ttcatgtctt ctttgtagga 
tgttttcttg cttttgaatt 
agatgtatca tttgcagaca 
tgtttccttt gttgtgcaga 



aaatatggac tgaggacatt 
ggccctgtct tcggtgagct 
gcagtgtagc ataatgaggg 
agacaggtac ctaccctggc 
gcaggataag taggagttag 
cat at act gc t q 1 1 aa t g-aa^ j^- 
acataatact ggtcaaaggg 
cagataaaaa acaccatatt 
caaatttgac attataaaag 
tcctaccata aactaaagat 
tttaatttat tttttaactg 
gtgtatatat atatgatgta 
taaatctatc aatggacatg 
tttaaaatct accctcttag 
tattgtacaa tgcatctcct 
gactaacatc cctgtaatcc 
ctgcttcttt gagtttaatg 
tctgtgcctg gcttatttca 
atgacaagat atttgtcttt 
tgttttcttt atccatttat. 
ttgtgaataa tgctgcaatg 
tttcgtttgg ttgtatatcc 
attttttgag gaaactccgt 
aaagtgtata agggttctgt 
accattctaa tgagcatgag 
attagtgatg ttgagcattg 
atgttatttt aggtttttct 
gtgtgagttc ctcatatatt 
tgttctccca tcctttaagt 
agctttttag tttgctgcaa 
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aaccatttat ctattttttc 
20340 

ttgccaagaa taatatcaag 
20400 

caggtcatat gtttaaatct 
20460 

aaaggtccac ttttattctt 
20520 

agatactgcc ctttcaccac 
20580 

tgtgtgggtg tatttctgga 
20640 

agaagctcta tgctgttttg 
20700 

gatgcactcc agctttgctc 
20760 

tccatacgaa ttttagggct 
20820 

tttgatggag attgcattga 
20880 

ccaattaatg aacacagggt 
20940 

tttttttctt aatttaattg 

21000 

tatgagtaag ttctttagtg 
21060 

atacactgta cccaatttgt 
21120 

gtccccaaag tccattgtat 
21180 

tatgagtgag aacatataat 
21240 

ggtctccaat tccatccaga 
21300 

gtagtattcc atagtatata 
21360 

ttggactggt tccatgtctt 
21420 

gtgtcttttt catataatga 
21480 

ggatcaaatg gtagttctac 
21540 

ggttgtacta gtttacattc 
21600 

caccatcatc tattattatt 
21660 

tattgcactg tggttttgat 
21720 

atatatttgt tggccatttg 
21780 

cattttttga tgggattatt 
21840 

atattagacc tttgttggat 
21900 

tgtttactct gctgattatt 
21960 

acctatttat cttttcgttg 
22020 

catctgcttt tgggttcttg 
22080 



ttctgttgac tatacttcca 
aagcttttct ctatgttttt 
ttaatccatt tttagttgat 
ctactagtgc atatccagtt 
tgtatgttac tggaaccttt 
ctctttatcc tgttttatta 
gtgactagag ctctgtagtc 
tttttgctca aaattgcttt 
tttttttttt ttcgattact 
atctttgggt agtatggata 
attttgcaat ttgtgttttc 
ttttatttcc atagggtttg 
gtgatttgtg agattttgat 
agtcttgtat ccctcacctc 
cattcttatg cctttgcatc 
gtttggttct ccatttctga 
ttgctgcgaa tgcctttatt 
catcccacaa tttctttatc 
tacaattgcg aattgtgctg 
cttctcttcc tctgggtaga 
ttttagttct ttaaggaatc 
ccaccaacag tgtagaagtg 
tgattttttg attatggcca 
ttgcatttcc ctgatcatta 
tacatcttct tttgagaatt 
tgtttttttc ttgctaattt 
gtgtaggttg tgaagatttt 
tcttttgctg tgcagaaact 
ttgttgtttt ttggggttgt 
gtcatgaagt ctttgcctaa 



gagttgtatc caaaaaatca 
ttctagtagt tttatagttt 
ttttgtatat ggagtgagat 
ttctcaacac catttattga 
gtagatcagt tgacaataaa 
gtttatatgt ctcttttttt 
aatttcagat caggtagtat 
ggctatttga gtttttttat 
gtgaataatg ccattggaat 
ttttaacagt attaatgctt 
ttcaatttct ttcaccagtg 
ggtaacaggt ggtgtttggt 
gcacccatca cctaagcagt 
cctcccacca tttcccccaa 
ctcatagctt agctcccact 
gttacttcat ttagaatatt 
ttgttccttt tcatggctga 
cattcttgat tgatgggcat 
ctacaaacat gcaggtgcaa 
taccctgtag tgggattgct 
tccacactgt tttccatagt 
ttccctgttc actgtatcca 
ttcttgcagg agtaaggtgg 
gtgatgttga gcattttttc 
gtctattcat gtcctttgtc 
gagttccctg tagattctgg 
ctcccactct ttgggttgtc 
ttttagttta attaagtccc 
tttgttttgg cttggttttg 
gccaatatct agaagggttt 
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ttctgatgtt 
22140 

agttgatttt 
22200 

gccaattatc 
22260 

gtgttcctat 
22320 

cctttctcat 
22380 

tttgtctgat 
22440 

ttttttccaa 
22500 

aggggcatat 
22560 

tttaattcat 
22620 

ttaattgttt 
22680 

cctttgtgat 
22740 

ttgctctact 
22800 

aaaaggctat 
22860 

ttactctacc 
22920 

gatgtgtccc 
22980 

gttttggctt 
23040 

tagggtattc 
23100 

tcatgttgtt 

23160 

taagatgggt 
23220 

ctctgcttca 
23280 

ttaattatgc 
23340 

gtttcactct 
23400 

tctgcctccc 
23460 

ggcatgcacc 
23520 

ttggtcaggc 
23580 

gtgctgggat 
23640 

gcttgaggaa 
23700 

ttcaaaataa 
23760 

tgcagtcatt 
23820 

ttaaaaaaaa 
23880 



ctagaatttt 
tgtataaggt 
ccagtacaat 
tttgggtaca 
tatataatgg 
aaaagttcag 
cccttcgcat 
gcttgggtct 
ttgtattcaa 
tcttgatgtt 
taggtgcttt 
ataggttttt 
tttaaactga 
aactgccctc 
cttattgtgt 
ttaactttat 
taaattgact 
aattagtatt 
ctagtagtgg 
ttttgaaata 
aaagtgccag 
tgttgcccag 
aggttcaagc 
accatgctcg 
tggtcttgaa 
tgcaggtgtg 
gaaagaattc 
gtaatcagaa 
ttatggaacc 
ctttattctt 



tatggttcag 
gagagatgag 
ttgttgaata 
tatttattta 
tcttcttgtc 
ctacctttgc 
tcactctatg 
tgttttattc 
ggtaattatt 
ttatagatct 
tctctagtgg 
gctttgtggt 
taacagctta 
cattttatgt 
atcccttaac 
actagagata 
atgtatttac 
ctttcatttc 
tgaacaccct 
taacttttgt 
ggtaagcaga 
gctggagtgc 
gattcttctg 
gctaattttg 
cacccgacct 
agccactgcg 
aaaattaaaa 
aaacatataa 
aatctgacta 
cttccactta 



gtcttagatt 
gatccagttt 
gggttaatat 
caactatcat 
tctttttaca 
tctcttttgg 
tgtgttctta 
attcattcag 
gacagacaag 
tttgttcctt 
tgtactttga 
taccatgagg 
actttcaaca 
ctttgatgtc 
aaattattgt 
gaattaatta 
ctttatcagt 
aacttggaga 
caacttttgt 
tccatgattg 
attactcttt 
agtggcgcaa 
cctcagcctt 
catttttagt 
cagatgatcc 
cctggccaga 
tttcacatta 
aaacacaata 
gattggatgc 
taaactttaa 



taagtccttg 
catgcttcta 
ttaaagcttt 
atcctcctga 
gtttttgtct 
tttctatttg 
aagatgaaat 
ccaccctttt 
gacttactac 
tcatcctctc 
tttttacttt 
gttacataaa 
cttaaaaaaa 
ataatttacc 
agcaacagtc 
acataccacc 
gagatttttg 
attcacatta 
ttatctggag 
aaatggacaa 
tttttttttt 
tctctcagct 
cctgagtagc 
agagacgggg 
gcccacctag 
attactctta 
cctaatggcc 
agataaacag 
agactaggta 
acctgctttg 



atccatcttg 
catgtggctt 

* 

atatatttag 
tggattgacc 
taaagcctaa 
catggaatat 
gagatgctgt 
gattagagaa 
tgccattttg 
ttactctttt 
ttatcttttg 
gcatagttat 
ctatacactt 
tagttttgga 
atttttaata 
actacattat 
ttttcaattt 
gcattttttg 
atgtctttac 
aattgttttt 
ctgagaccga 
taccgcaacc 
tgggattaca 
tttctccatg 
gcctcccaaa 
tttatcctga 
aaagcctgca 
actaaatata 
ggatgcaaat 
tggagcaagt 
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tctttttatc 
23940 

cacaagaaca 
24000 

caaatactta 
24060 

aagccagcag 
24120 

ctgctgtttc 
24180 

ctggtactat 
24240 

aagaggaaac 
24300 

gacaagagaa 
24360 

cttgccagtt 
24420 

cctctgcctt 
24480 

ctccaaggag 
24540 

ggggaagcac 
24600 

tttctctcaa 
24660 

ctttatatga 
24720 

ttactgctag 
24780 

tctgcctctg 
24840 

ataaaccagt 
24900 

acagaaaaaa 
24960 

gaccagctgg 
25020 

cctgcagtcc 
25080 

acattgcctg 
25140 

ggagaccaca 
25200 

gctaacatgt 
25260 

gacatgaatt 
25320 

gaatccatgt 
25380 

gataatcttt 
25440 

catctttatc 
25500 

aaatatacaa 
25560 

ttggaatcca 
25620 

gtactttcaa 
25680 



tctggggaaa 
atcttaggtc 
tccacttgag 
tggtttatca 
tggcccctcc 
tgtgatcatg 
tccctttctt 
taattgtcct 
aattggttta 
tctgtctgtt 
cacaagtcag 
ccttcctttt 
aattcttaag 
tgagctaaga 
tttagcagct 
ggacctcagc 
ggcaaaatgg 
agtaaactgg 
gtgagatgtc 
cgtccatttc 
caggtcttta 
gtgacaaaga 
gacttaggtt 
tcaatgactt 
ctctggaatc 
gtttggctga 
tgaaagacca 
catttaatag 
atgttaacag 
tgtggcccaa 



gatcctgagt 
agtaattaaa 
ctcttctttc 
tcgacttatt 
aggaatggtt 
ccaaagggct 
ggagactctc 
taggcagact 
aggacacagt 
ctgagttata 
atcatctaag 
ccatggcact 
cctctcctct 
gttacaaaac 
cactttataa 
tcatcctgag 
ctttaacctg 
ttatgatatc 
gtacaccaca 
cagagatctc 
cattcttttc 
ttagtgagtc 
ttatcaccta 
tcccaaaggc 
ccagcccagg 
gtttgagacc 
agggggatct 
ttaatataga 
atgcttatac 
aatccagagg 



aagtctcata 
ctatctggcc 
catcccagct 
cttactgact 
ttaggaggaa 
tggtggatat 
tcactagaac 
ctttttcaag 
tgcacatcct 
gcctttcaca 
tgatcctctt 
ctggcattcc 
ttaatccttc 
tggtttttag 
taaggatata 
gcagagagtc 
agggtaataa 
tgagtccctt 
atgtgcatca 
agcaagccac 
ctaagcagtt 
tcttagcact 
tgaggagctc 
acatagccag 
gtctcttcca 
gagctgaaac 
ttggcctcat 
gccttcagac 
aatgatttac 
cagccccaat 



gagttctcat 
cagtgtaata 
tggtacttct 
agctccccaa 
aggggataag 
tccatgcttc 
tttccagagg 
ctggtcccag 
tgccttgcct 
tcagtcctgt 
gaagcctctt 
aacaacactt 
gccattttta 
aaatctcctt 
tgatatattt 
ccattttaac 
ttaccaggaa 
ccctccctca 
aggagacgtg 
ttaccttccc 
cttagaggct 
tggagaagtc 
agaggataat 
ttgcagcaaa 
ttgtgggaca 
ttcatggaaa 
catcataata 
ccattatctc 
agttcactga 
gtgtagatga 



tcatttaaat 
ctgaaacttt 
ttggtcctag 
tacccagtag 
gagtaaaggg 
cctttctctc 
tgattcaggg 
agctttccct 
ctgctgctgt 
actccccaaa 
gtttaagatg 
taaataattt 
tgtattatta 
agcaaatgtt 
ctttggttcc 
attctgttac 
caaacagaaa 
tcctcacagg 
ccgattgatt 
agatggatgc 
atgggatcct 
aaaagataat 
gctttggtca 
gctaagccca 
tcatttctaa 
atagcaccag 
tcacccttat 

atttttcccc 
acacttttaa 
cattaactga 
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tgtgagcaga gctagaactt gtgcggagac 
25740 

caacacaggt ttctgagcag ggcttatagg 
25800 

tgattcaatg ttctattaat tcatgtctta 
25860 

aacacctaca gcctgttact gtaactttgc 
25920 

agaattattt ctggaaacca aataaccctc 
25980 

ttctccccaa accacatgga tatttgccaa 
26040 

caatggaaat ttgtcatggg atctgcatga 
26100 

cgttttcctc tcaagacaga gtcttgctat 
26160 

ctcggctcac tgcaacctct gcctcccagg 
26220 

agtagctggg attacaggtg cacaccacgc 
26280 

ggggtttcac catgttggcc aggctagtct 
26340 

cagcctccca aagcgctggg actacagcca 
26400 

tttataccta aattgtctcc aggagtgctt 
26460 

cacagtggct gacgcatata atcccaatat 
26520 

aagttaggag tctgagacta gcctgggcaa 
26580 

aaaagagaga gatagccagg catggtgttg 
26640 

tgaggcagga ggatcacttg agctcagaag 
26700 

actgctctcc agcctgattg acaggccaga 
26760 

tatttaagiza atttccaaac atagcagaaa 
26820 

caccaacagc tacttaagat agagtcatga 
26880 

gtgccaaccc aagccgcatc ttcttaggtg 
26940 

cagagcattg ccaggagcta gctcttccct 
27000 

gtagcccaca acttgctctt tctcctgcag 
27060 

caccgtggtt cttagtattt ggggtcttca 
27120 

ggtatgattc tctcttgtac ataaatactt 
27180 

tggtagctaa gcacagaagt ggctatataa 
27240 

taaacataaa agccaaaaga aatgtaaaac 
27300 

gtatcagtga tttctttcat gtaagccact 
27360 

agctggagta tatgtctctg taataattgg 
27420 

tggatgcaca tccatttcta agtggatgta 
27480 



cctgagtctg 


gagcctagag 


ttcttcggaa 


aagcagaggg 


gtcatgtgag 


acatattatc 


ggaagcaagc 


caacaggatt 


gcttctggca 


tgacagaccc 


agaattaatt 


tctggaagct 


acattctctc 


tcctttgttt 


tgtactctgt 


aattctccac 


tttccatatg 


tgaatagcac 


cagaatcaca 


gttctgtgtg 


tgtgtgtgtg 


gtagcccagg 


ctggagtaca 


gtggcgtaat 


tttaagcagt 


tctcctgcct 


cagcctcccg 


ctggcaaatt 


tttgtatttt 


tattagagat 


caagctcctg 


atctcgagac 


cagccctcct 


tgagccactg 


cacccagcca 


gttctgtgct 


aatagtccat 


taataggtat 


ttaggccagg 


tttgtgacac 


caaggtggga 


agactgcttg 


catagggaga 


ccctgtcttt 


acaaaaaaaa 


catgcttgta 


ttcctgccta 


cttgggggac 


ttcaaggtta 


ccgtgagcaa 


tgttcacgcc 


ccctgactct 


aaacaaaaac 


aaaaaacaaa 


atataagcat 


ggtttatcac 


tttgatatga 


attcagtaaa 


ttgttgtgtg 


gaaagctaag 


ctcctcactg 


gtgtcatcag 


ctacagcagg 


tcaagaacaa 


aagtcttgtt 


taagagcaca 


tctcttttat 


ttccctcctt 


tcttagggat 


ccacaaccct 


gctgtctgga 


aaaacccaaa 


ccaagaacta 


atgctgtgca 


agtcactttt 


ttaagggaaa 


tgacacaaat 


taaacaaaaa 


tattctatgt 


tcttgaaaca 


ctcttgacgt 


aaggtttaag 


atctattact 


tgtaacagga 


ccacatcatc 


attttgactt 


gatttctaag 


tctccatagt 


gaaaataata 


ccacttgcca 
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tagtattttt 
27540 

tctgaaggta 
27600 

catgtggcca 
27660 

acatagtacc 
27720 

ggttggaatg 
27780 

atgtgaccac 
27840 

gtgctaggtg 
27900 

cctctcattt 
27960 

cccctatgcc 
28020 

aagtacccaa 
28080 

attcttccag 
28140 

tacaaaggac 
28200 

acaatgtaag 
28260 

ttcatgagta 
28320 

gctcaaattc 
28380 

tggaaaaaat 
28440 

cagaagaaac 
28500 

ttgggcagga 
28560 

tcagagtgac 
28620 

ccaagaatgg 
28680 

atgattaaac 
28740 

aaagggatca 
28800 

gatccaaaat 
28860 

ggagattttc 
28920 

gactttcata 
28980 

gcatgtatgc 
29040 

aaatctcact 
29100 

aacagaataa 
29160 

tgtagcaatg 
29220 

aataaaaatt 
29280 



gtttgcctgg 
cactgcccag 
gttggaattg 
ctaaaaaaat 
gtaatttttg 
cagaacattt 
gtaggtgaag 
caggtctttg 
tacttaccat 
agatgtttac 
ggaaccgtag 
aatcgtattc 
ctactgctca 
actccaactg 
tcacagtgaa 
atcactttac 
atcatttttt 
gtttgccatg 
tccagacccc 
gatgtatttg 
gtactttgtt 
atgtatggtg 
catttctagg 
agatcttttc 
tattttctgt 
tcactgtcaa 
tcacttagcc 
tttggtgtgc 
tatttgtata 
caccttattc 



gtatcagaca 
tgtagtagcc 
agttgtgctg 
gtgaaacatt 
gttaaataaa 
taaattacac 
aaatgtgttc 
accccttgag 
tctcagctgg 
ttgagagtag 
atcttggtgc 
tctgtcacat 
taggctcaat 
ccgccttgtt 
caatttaagt 
tgtgtacttc 
caagtatcac 
attgagttaa 
accaggcctc 
cacctgaaga 
tttcgaagtt 
ggaggattgg 
tacacagtgt 
tgttaaactt 
tgtttttaaa 
aaattcccaa 
gacattccat 
attctttcag 
gatgtgatca 
cttatcattg 



aatcagctgt 
acgggccaca 
taagtttaaa 
tccttttagt 
ctctattaag 
atgtagatca 
atgttgtttg 
gttctctcag 
atcaaggtga 
tttattcctt 
ctatttgagc 
cctttttggc 
gcagtccacc 
atagggaagg 
ctaaagttca 
agacttcttg 
tttctttccc 
aggtaaccat 
ttactttccc 
aactctctga 
aaatttacag 
aggttggtgg 
gtcagctaga 
tcactactat 
atagttttca 
cactagaaaa 
gccctgacca 
actttttcct 
ttcctatatt 
ctttatggta 



gaagctgcaa 
tacggctact 
atacgtgctg 
aattatttat 
attaacttca 
cattatattt 
ggggatggtg 
gagaattctg 
gaacaatttg 
tcagctcctc 
cccaaaggat 
catgcctcaa 
ttcaaagcaa 
catcatgttg 
aaagtttcaa 
tactagtatt 
tcttgtcttc 
tgccttgatt 
caaccatttt 
atgttagatc 
ctaatgatcc 
gataggggtc 
tctgtttcta 
taatgctgta 
gaattatgca 
tcatgtagaa 
atcctactgc 
atacatttta 
gttattgatt 
ttctgtaata 



ggtctgcagg 

gagcacatga 

gattttgaag 

attgattaca 

ccttttaaaa 

ctattgatcg 

ttggggttgt 

atcagagaca 

aagttgctga 

agctctatac 

cagttagttt 

aagcagtccc • 

gagaaataat 

gagcctccca 

tggcatttgg 

ttact:atagt 

aggaactgca 

ctgctccact 

atcctcaagc 

tcagggtaca 

aagcagatag 

tctgtgaaga 

tataactttg 

tacaccaata 

agtaataagt 

taaaaatttt 

ttttcctaaa 

tatgtagaaa 

tttttcactt 

tgaatgtact 
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ataatttatt taactatttt ccttattggg 
29340 

gcttgtcaat ggcaacaaaa gccaaaattg 
29400 

tctgcacagc aaaacaaact accatcacac 
29460 

tttttgcaac ctactcatct gacaaaggcc 

29520 

aaatgtacaa gaaaaaaaca accccatcaa 
29580 

ctcaaaagaa gacatttacg cagccaaaag 
29640 

catcagagaa atgcaaatca aaaccacaat 
29700 

aatcattaaa aagtcaggaa acaacaggtg 
29760 

tttacactgt tggtggcagg agaatcactt 
29820 

gaggtggcgc cactgcactc cagcctgggc 
29880 

aaaaaaagga caccaaactt ctcaatctta 
29940 

tctctctcag acagagtcat cttttgctga 
30000 

attataatct cattaattgc agcaacacaa 
30060 

gatgacctaa tttgctttca ctcttccatc 
30120 

atctacctaa aatctatata taaaaaaatc 
30180 

aacacccacg tctaaaacca aatttgttta 
30240 

cattttgtca ctattttgtc agctggtata 
30300 

ttgttttttt ctggtacaaa cccaaataaa 
30360 

aataactcac tttctctata tatctccttc 
30420 

taaaagcatg catgataaat tgtactgaat 
30480 

aattttctgt gtctctgggg tcttacctat 
30540 

ttattaatat tcaatttcat tatcttcttt 
30600 

caataattta tttgtcaggt tgccaggtgc 
30660 

aattttactt taaatatttt tagaaaagag 
30720 

tattgttttt tcactgacat tttgtgaatt 
30780 

tcgaaatatg tgccacagac aattttgtta 
30840 

gagataaata ttcaatatac cttatatttc 
30900 

aaattgtata tcatataaat gataacaagt 

30960 

tcaaattaga aactttcata ggtaggaagt 
31020 

caagaaaatg tcattggcat tcaccatggt 
31080 



catttaagtt 


atttctagtt 


ttaaaaacat 


acaaatggga 


tctaattaaa 


ctaaagagct 


tgaatgggca 


gcctacagaa 


tgggagaaaa 


taatatccag 


aatctacaat 


gaactcaaac 


aaagtgggtg 


aaggatatga 


acagacactt 


acacatgaaa 


aaatgcctat 


cgtcactggc 


gagataccat 


ctcacaccag 


ttagaatggc 


ctggagagga 


tgtggagaaa 


taggaagact 


gaacccggga 


gggggaggtt 


gcagtgagcc 


gacagaacga 


gtactccatc 


tcaaaaaaaa 


atgttgtcat 


ctatgtggta 


tcttccataa 


tatgatctta 


cagtattttt 


tgtttatacc 


atgacaaaag 


acaactgatt 


tctccccttg 


atcacttata 


acatgatgat 


tctcaaattc 


cctcccttga 


attccagatc 


cttggagaca 


acactggacc 


agtcgtcctg 


tgtgactttc 


ccaatatcca 


cccagttaaa 


caatatttcc 


ttacaaacat 


caataaaagt 


aaaattctaa 


ttgctggaaa 


aatgggttag 


gttagttctt 


acaatattca 


ggtctggaca 


tactaggtat 


ttggggtcaa 


aataaacaag 


tttattaagc 


aacaattatg 


ttccctggta 


gtttcattgc 


ttctaaactt 


ctgtgtattt 


tttcatatcc 


gtctgttaaa 


tttcctaata 


attattatat 


gaaaaccctt 


aaaaatatga 


aatcattttt 


aataagaaga 


cagaaacagg 


gcattatcaa 


tgtcacacat 


ttttatacca 


actgtgccaa 


tcacaaaggc 


attcctttat 


cccttaactc 


aggggaagca 


tatattccct 


ttgaaaggtg 


actcttcaag 


cttaaaaaaa 


atggactgca 
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aaacatttac aaacatagca tatttattgg gtacctttat gtttacataa atattgaaga 
31140 

tatctcacat acctctttca atcagattat ctcactgaca tttattgacc actttctatg 

31200 

gggaaaac 

31208 



<210> 4 
<211> 489 
<212> PRT 
<213> Human 



<400> 4 



Val 


Ala 


Ala 


Leu 


Leu 


Gly 


Leu 


Leu 


1 








5 








Leu 


Tyr 


Leu 


His 


Arg 


Gin 


Trp 


Leu 








20 










Cys 


Pro 


Pro 


Phe 


His 


Trp 


Leu 


Leu 






35 










40 


Asp 


Gin 


Glu 


Leu 


Glu 


Arg 


He 


Gin 




50 










55 




Ala 


Cys 


Pro 


Trp 


Trp 


Leu 


Ser 


Gly 


65 










70 






Asp 


Pro 


Asp 


Tyr 


Leu 


Lys 


Val 


He 










85 








Pro 


Arg 


Asn 


Tyr 


Lys 


Leu 


Met 


Thr 








100 










Leu 


Leu 


Asp 


Gly 


Gin 


Thr 


Trp 


Phe 






115 










120 


Ala 


Phe 


His 


Tyr 


Asp 


He 


Leu 


Lys 




130 










135 




Ser 


Val 


Gin 


He 


Met 


Leu 


Asp 


Arg 


145 










150 






Ser 


Ser 


Leu 


Glu 


He 


Phe 


Gin 


His 










165 








He 


Met 


Lys 


Cys 


Ala 


Phe 


Ser 


Tyr 








180 










Asn 


Ser 


His 


Ser 


Tyr 


He 


Gin 


Ala 






195 










200 


Phe 


Tyr 


Arg 


Ala 


Arg 


Asn 


Val 


Phe 




210 










215 




Leu 


Ser 


Pro 


Glu 


Gly 


Arg 


Leu 


Phe 


225 










230 






Glu 


His 


Thr 


Asp 


Arg 


Val 


He 


Gin 










245 








Glu Gly 


Glu 


Leu 


Glu 


Lys 


Val 


Arg 








260 










Asp 


Val 


Leu 


Leu 


Phe 


Ala 


Lys 


Met 






275 










280 


Gin 


Asp 


Leu 


Arg 


Ala 


Glu 


Val 


Asp 




290 










295 




Thr 


Thr 


Ala 


Ser 


Gly 


Val 


Ser 


Trp 


305 










310 






Pro 


Glu 


His 


Gin 


His 


Arg 


Cys 


Arg 










325 








Asp 


Gly 


Ala 


Ser 


He 


Thr 


Trp 


Glu 








340 










Thr 


Met 


Cys 


He 


Lys 


Glu 


Ala 


Leu 






355 










360 



Leu 


Leu 


Leu 


Leu 


Lys 


Ala 


Ala 


Gin 




10 










15 




Leu 


Arg 


Ala 


Leu 


Gin 


Gin 


Phe 


Pro 


25 










30 






Gly 


His 


Ser 


Arg 


Glu 


Phe 


Gin 


Asn 










45 








Lys 


Trp 


Val 


Glu 


Lys 


Phe 


Pro 


Gly 








60 










Asn 


Lys 


Ala 


Arg 


Leu 


Leu 


Val 


Tyr 






75 










80 


Leu 


Gly 


Arg 


Ser 


Asp 


Pro 


Lys 


Ala 




90 










95 




Pro 


Trp 


He 


Gly 


Tyr 


Gly 


Leu 


Leu 


105 










110 






Gin 


His 


Arg 


Arg 


Met 


Leu 


Thr 


Pro 










125 








Pro 


Tyr 


Val 


Gly 


Leu 


Met 


Val 


Asp 








140 










Trp 


Glu 


Gin 


Leu 


He 


Ser 


Gin 


Asp 






155 










160 


Val 


Ser 


Leu 


Met 


Thr 


Leu 


Asp 


Thr 




170 










175 




Gin 


Gly 


Ser 


Val 


Gin 


Leu 


Asp 


Arg 


185 










190 






He 


Asn 


Asp 


Leu 


Asn 


Asn 


Leu 


Val 










205 








His 


Gin 


Ser 


Asp 


Phe 


Leu 


Tyr 


Arg 








220 










His 


Arg 


Ala 


Cys 


Gin 


Leu 


Ala 


His 






235 










240 


Gin 


Arg 


Lys 


Ala 


Gin 


Leu 


Gin 


Gin 




250 










255 




Arg 


Lys 


Arg 


Arg 


Leu 


Asp 


Phe 


Leu 


265 










270 






Glu 


Asn 


Gly 


Ser 


Ser 


Leu 


Ser 


Asp 










285 








Thr 


Phe 


Met 


Phe 


Glu 


Gly 


His 


Asp 








300 










He 


Phe 


Tyr 


Ala 


Leu 


Ala 


Thr 


His 






315 










320 


Glu 


Glu 


He 


Gin 


Gly 


Leu 


Leu 


Gly 




330 










335 




His 


Leu 


Asp 


Gin 


Met 


Pro 


Tyr 


Thr 


345 










350 






Arg 


Leu 


Tyr 


Pro 


Pro 


Val 


Pro 


Ser 



365 
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Val 


Thr 


Arg 


Gin 


Leu 


Ser 


Lys 


Pro 


Val 


Thr 




370 










375 








Leu 


Pro 


Lys 


Gly 


Val 


He 


Leu 


Phe 


Leu 


Ser 


385 










390 










Asn 


Pro 


Lys 


Val 


Trp 


Gin 


Asn 


Pro 


Glu 


Val 










405 










410 


Ala 


Pro 


Asp 


Ser 


Ala 


Tyr 


His 


Ser 


His 


Ala 








420 










425 




Gly 


Ala 

• 


Arg 


Asn 


Cys 


He 


Gly 


Lys 


Gin 


Phe 






435 










440 






Val 


Ala 


Val 


Ala 


Leu 


Thr 


Leu 


Leu 


Arg 


Phe 




450 










455 








Thr 


Arg 


Val 


Pro 


He 


Pro 


He 


Ala 


Arg Val 


465 










470 










Gly 


He 


His 


Leu 


Arg 


Leu 


Arg 


Lys 


Leu 





485 
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Phe 


Pro 
380 


Asp 


Gly 


Arg 


Ser 


He 


Tyr 


Gly 


Leu 


His 


Tyr. 


395 










400- 


Phe 


Asp 


Pro 


Phe 


Arg 
415 


Phe*. 


Phe 


Leu 


Pro 


Phe 
430 


Ser 


Gly 


Ala 


Met 


Arg 
445 


Glu 


Leu 


Lys 


Glu 


Leu 
460 


Leu 


Pro 


Asp 


Pro 


Val 


Leu 


Lys 


Ser 


Lys 


Asn 


475 










480 
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Claims Nos.: 17, 18 partially 



Claim 17 refers to a pharmaceutical composition comprising an agent 
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such compounds are defined in the application. In consequence, the scope 
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wording is, in fact, a mere recitation of the results to be achieved. But 
a partial search has been carried out as far as the agent is an antibody 
against the polypeptide, a ribozyme a probe or an anti sense. 
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claims, relating to inventions in respect of which no international 
search report has been established need not be the subject of an 
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is advised that the EPO policy when acting as an International 
Preliminary Examining Authority is normally not to carry out a 
preliminary examination on matter which has not been searched. This is 
the case irrespective of whether or not the claims are amended following 
receipt of the search report or during any Chapter II procedure. 
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